|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 5:935-948, 2006.
© 2006 by The American Society for Biochemistry and Molecular Biology, Inc.
,

From the Laboratory for Biological and Medical Mass Spectrometry, Uppsala University, S-75123 Uppsala, Sweden
| ABSTRACT |
|---|
|
|
|---|
M between the molecular masses of the modified and unmodified peptides, whereas the retention time difference
RT between their elution in reversed-phase liquid chromatography provides an additional dimension for PTM identification. Abundant sequence information obtained with complementary fragmentation techniques using ion-neutral collisions and electron capture often locates the modification to a single residue. The (
M,
RT) maps are representative of the proteome and its overall modification state and may be used for database-independent organism identification, comparative proteomic studies, and biomarker discovery. Examples of newly found modifications include +12.000 Da (+C atom) incorporation into proline residues of peptides from proline-rich proteins found in human saliva. This modification is hypothesized to increase the known activity of the peptide.
To reduce the analysis time and the false positive and negative rates, a typical database search focuses upon a few types of modifications, far fewer compared with the broad variety that potentially can be present in the sample. A database search strategy is limited by nature, and although major improvements have been made over the past couple of years (8), most of the acquired tandem mass spectra remain unidentified through these searches. In a typical LC/MS proteomic-type analysis, the identification success rate usually varies between 5 and 15% (9). Even with FTMS that provides ppm mass accuracy and can use two complementary fragmentation techniques (collisionally activated dissociation (CAD) and electron capture dissociation (ECD) (10) ), no more than 30% of MS/MS datasets produce positive identifications (11).
Part of the unidentified mass spectra may be due to unexpected modifications. Fig. 1 shows an example of an endogenous peptide from a human saliva sample sequence suggested by Mascot as peptide WAPGGQQSSQ from an unnamed human protein. Although five identified fragments deviated from their theoretical values by less than 11 mDa (Fig. 1, C and D) and the data quality was good (S-score value (11) was four, way above the threshold value of two), the dataset received a Mascot score (M-score) of 18, below the threshold value of 41. A database search using several common variable modifications did not provide a better answer. Subsequently a ModifiComb search (see below) identified the peptide as a modified version of another peptide that eluted from the nano-LC column some 9 min earlier and was 12.000 Da lighter (survey spectrum integrated over the 9-min time interval is depicted in Fig. 1A), identified by Mascot as GPPQQGGHQQ (Fig. 1B) with M-score of 47. Note that the masses of all 12 identified fragments were internally consistent with experimentally measured masses deviating from the theoretical values by less than 5 mDa and with the deviation changing linearly with the fragment mass (Fig. 1B, inset). The 12.000-Da shift was observed in y8 and y9 fragments as well as in all b fragments. Accurate mass analysis of the mass difference (109.055 Da) between the y8 and y7 fragments revealed the unique elemental composition of the third amino acid (C6H7NO) only 2 mDa away from the theoretical mass (109.053 Da) of the modified proline residue that has the same elemental composition. Thus the identity of the +12.000-Da modified proline was additionally confirmed. Such a proline modification is not reported for humans (12), although analogues of it can be found in the literature (see below). After the insertion of this modification into the Mascot search as a user-defined modification, the modification position was confirmed with M-score of 48 and a nearly perfect fit of 12 fragment masses (Fig. 1E).
|
Here we report on a software tool ("ModifiComb") that searches for such peptide families and reveals the PTM and mutation patterns of complex peptide mixtures. The tool "combs out" from large data arrays pairs of peptides with strong sequence similarities, one of which is a base peptide and the other of which is a dependent peptide (Fig. 2). The base peptide is usually identified either via de novo sequencing or database searching, whereas the dependent peptide should not give database identification without variable modifications included. Identification of the base peptide is not critical for the analysis: based on sequence similarity, peptide pairs can be found in a "blind" search without knowing which peptide in the pair is the base one.
|
M histogram of the differences between them. This
M histogram built for one LC/MS run or several related runs represents the overall pattern of all mutations and PTMs present in the corresponding sample. For
M values below 100 Da, the mDa mass accuracy of FTMS reveals the corresponding elemental composition of the modification; for
M > 100 Da values the high mass accuracy limits the number of possible elemental compositions. Inspection of the MS/MS data can often reveal the position of the modification. Once all the base peptides are identified, the
M histogram takes seconds to build. This identification takes seconds if de novo sequencing is used or minutes if a database search is used (the search is typically used without variable modifications or with a few obvious ones, such as oxidation of methionine). Thus the overall data analysis using the
M histogram is much faster than the data acquisition, removing one of the throughput bottlenecks. The difference in the retention times,
RT, between the dependent and base peptides is used as complementary information, although the intrinsic resolution, precision, and accuracy of RT measurements are much below the mass measurements. The (
M,
RT) pair provides a two-dimensional map of the present PTMs and mutations. Several earlier analogues of the ModifiComb approach can be found in the literature. Recently approaches have been developed to minimize the computational cost of complete PTM identification by applying database filters (13, 14). The filters are based on peptide sequence tags (15) extracted from the acquired MS/MS data. The tags reduce the database to a much smaller set of sequence candidates that can be searched with multiple variable modifications in a reasonable time. This approach is still largely limited to known protein sequences and modifications and can miss modifications if they occur inside the sequence tag. Additionally this approach is firmly database-oriented, that is rather slow and sensitive to sequence errors that are present in all databases. Finally this approach does not produce sample-specific fingerprint patterns. An ideologically similar strategy has been described by Zhang et al. (16) for a low resolution ion trap. However, that approach only worked on mixtures containing a few proteins.
Recently Tsur et al. (17) described MS-Alignment, a software tool for a blind PTM search in large MS/MS datasets. Although using an impressively sophisticated alignment algorithm, MS-Alignment has a number of limitations. Integer
M values that the algorithm uses mask the underlying complexity of modifications (e.g. modifications with elemental compositions CO (formylation), N2, and C2H4 have the same integer mass of 28). Furthermore MS-Alignment processes CAD-only datasets, and analysis speed requirements limit the
M region (100 to +160 Da in Ref. 17). In contrast, ModifiComb uses accurate mass data, has no limit on
M values, and uses combined ECD/CAD datasets. As already mentioned, ModifiComb also makes use of the retention time differences, i.e. uses both dimensions of LC/MS separation, which gives it very high specificity. For instance, a dependent peptide with
M = +0.977 Da and a small positive
RT is surely a deamidated version of the base peptide, whereas
M = 1.003 Da and a large
RT is likely due to a monoisotopic mass misassignment in one of the peptides.
There is one more significant different between ModifiComb and other algorithms. The usual approach to reducing the search space for modifications is to identify first the set of proteins present in the sample and then search PTMs in that small database. ModifiComb goes one step further and searches PTMs only for identified peptides, further reducing the search space by an order of magnitude (although
5 peptides per protein are on average identified in our analysis (18), an average protein produces 50 tryptic peptides). The search space reduction diminishes the probability of false positive PTM identification and obviates the development and validation of a special scoring algorithm (see below). The explicit requirement in the ModifiComb non-blind search for the unmodified peptide to be present is a limitation but not a too narrow one as most PTMs appear in substoichiometric proportions.
In the current work, we tested ModifiComb and built
M,
RT histograms and (
M,
RT) maps for several biological samples. Sensitivity, specificity, and repeatability of the approach were evaluated. Because the ability of the program to find new and unexpected modifications by far exceeds our current capacity to characterize them, here our goal was not to report all findings, and we limited the current report to the demonstration and validation of the ModifiComb operation. Several examples of new modifications and sample fingerprinting (M-fingerprinting) are provided as an illustration, and their potential biological importance is discussed.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
K562 human chronic myeloid leukemia, A431 human carcinoma, and Escherichia coli cell lysates were prepared as described previously (11, 18). Briefly 500, 200, and 70 µg of cell lysates, respectively, were loaded onto a one-dimensional SDS gel, and one lane of each lysate was excised into 2030 gel pieces. The K562 and E. coli samples were prepared a second time following the same procedure, but the E. coli gel was only cut into seven pieces. In-gel reduction, alkylation, and digestion with trypsin (Promega, Madison, WI) were performed as described in the literature (19). Samples were dried to complete dryness using a SpeedVac and reconstituted immediately prior to analysis in 20 µl of water containing 0.1% TFA.
Liquid Chromatography/Mass Spectrometry
Analysis was performed on a 7-tesla hybrid linear ion trap Fourier transform mass spectrometer (LTQ-FT, Thermo, Bremen, Germany) equipped with a nanoelectrospray ion source (Proxeon Biosystems, Odense, Denmark). An HPLC system was used on line with the mass spectrometer. The system (Agilent 1100 nanoflow) consists of a solvent degasser, a nanoflow pump, and a thermostated microautosampler. Solvents used consisted of 99.5% water and 0.5% acetic acid as buffer A and 90% acetonitrile, 9.5% water, and 0.5% acetic acid as buffer B. The peptide sample was automatically loaded at a flow rate of 500 nl/min onto a 15-cm-long Proxeon nano-ESI emitter (75-µm inner diameter, 360-µm outer diameter) packed in house with fully end-capped Reprosil-Pur C18 3-µm resin (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany). After 20 min of loading time the peptides were eluted from the column with a 90-min gradient (445% buffer B) at a flow rate of 200 nl/min. MS analysis was performed using unattended data-dependent acquisition mode in which the mass spectrometer automatically switches between a high resolution survey scan (resolution = 100,000, m/z range 3001500) followed by lower resolution fragmentation spectra (ECD followed by CAD; resolution = 25,000) of the two most abundant peptides eluting at a given time.
Database Search: Peptide Identification
All data were searched using the Mascot search engine (version 2.1, Matrix Science, London, UK) against either the International Protein Index database (human, 3.10; downloaded September 5, 2005) for saliva, K562, and A431 samples or National Center for Biotechnology Information non-redundant database (www.ncbi.nlm.nih.gov; downloaded September 5, 2005) with taxonomy specified to E. coli. For base peptide identification, only oxidized methionine was chosen as a variable modification. Searches were performed with no enzyme specificity for the human saliva sample and with trypsin specificity (20) for K562 human leukemia, A431 human carcinoma, and E. coli cell lysates; mass tolerance for monoisotopic peptide identification was set to 5 ppm and ±0.02 Da for fragment ions. The instrument setting was "ESI-FTICR," which only permits b, y, b NH3, and y H2O fragment ion types. For identification and validation of the dependent peptide sequences revealed by ModifiComb, all data were researched with the Mascot search engine allowing for the known or user-defined variable modifications. The peptide mass tolerance, mass accuracy window for fragment ions, and the "no enzyme/enzyme specificity" option as well as the instrument settings were kept unchanged. Parsing of data and statistical analysis of the search results reported by Mascot was performed using the open source software MSQUANT (msquant.sourceforge.net).
Algorithm Description: Overview
The block diagram of the ModifiComb algorithm is presented in Fig. 2. The ModifiComb algorithm works under the assumption that both the unmodified as well as the modified version of a peptide are present in the mixture. The ECD and CAD spectra are treated as described in Ref. 11, and a merged fragment list is submitted to the Mascot search engine. A list of Mascot reliably identified (M > 34) base peptide sequences is created for each sample. Another list is created for dependent peptides that were not identified by Mascot (or received a below threshold score). The dependent peptide fragment lists are then compared with those of base peptides, and for each peptide pair, the molecular mass difference
M and retention time difference
RT are calculated. The
RT value is currently approximated by the difference in the scan number of the dependent and base peptides (any given scan duration is a function of the given ion abundance, but the average value of the scan duration fluctuates insignificantly during the peptide elution time). The algorithm determines that the pair "matches" if a certain predefined number (usually four) of fragments of the dependent peptide either coincide within the given mass accuracy with the observed fragments in the base peptide or the corresponding masses are shifted by
M. The "matched" peptide pair is reported in the output file. Simultaneously their
M and
RT data are added to one-dimensional histograms
M and
RT and a two-dimensional map (
M,
RT).
Details of the Peptide Matching Algorithm
ModifiComb has two regimes: blind and "open eyed." In the latter regime, the base peptides are identified either through the Mascot search or de novo sequencing. In the blind regime, the base peptide remains unidentified. Below a detailed description is given for the open eyed regime in the case of base peptide identification by Mascot (the procedure for de novo sequencing is easily extrapolated). First an initial search is performed with no variable modifications except oxidation of methionine for each dta file containing extracted consensus information from ECD and CAD MS/MS (18). The output contains the received Mascot score M (M
0 if Mascot suggested a sequence, and M = 0 if Mascot did not make any suggestion) and the corresponding Mascot-suggested sequence for M > 0. The user defines three parameters.
All dta files with M > M1 are considered to belong to base peptides (A), whereas those with M < M2 are viewed as belonging to potential dependent peptides (B). Each possible pair of A and B peptides are considered. For each compared pair, n is calculated in the following way. First
M is determined as the difference of the molecular masses between B and A peptides.
M is then considered as the mass of the potential modification (thus this approach intrinsically favors single modifications), which can assume a positive as well as a negative value. The Mascot-suggested peptide sequence for A is used to generate a list of b ion masses [b1,..., bL] and y ion masses [y1,..., yL] where L + 1 is the length of the sequence. The masses [m1,..., mk] in the dta file are already tagged with the likely type of the ion they represent, y, b, or by (either b or y ion) (18). The masses tagged y and b are compared with the theoretical sets [y1,..., yL] and [b1,..., bL], respectively, whereas by-tagged ions are compared with both sets. A "match-1" means that the masses of the theoretical and experimental fragments coincide within 20 mDa, whereas a "match-2" means that they differ by
M± 20 mDa. If some fragment of a defined type (b or y) matches by match-2, then all subsequent fragments of the same type should also match as match-2.
This requirement may seem too stringent because the type of the fragment might be identified incorrectly in which case the above requirement may not be fulfilled for a perfectly legitimate pair A and B. However, carefully made validation of the fragment type identification procedure has ensured its extremely high reliability, which stems from the high mass accuracy and the consensus-based selection rules (11, 18). After the matching procedure is done, the number N of matched cleavage sites in the given pair of dta files is counted (matched complementary b and y ions are counted as one cleavage site) and reported. If N
n, the peptide pair is considered a hit, and the
M value, N, the scan number difference between the two dta files, their identities, and Mascot scores are reported.
Database-independent Comparison
A blind search is performed when neither the protein database nor reliable de novo sequencing data are available, e.g. it can be used for analysis of a set of unknown proteins. In the database-independent comparison, all dta files are compared with each other. Fragment masses are compared with each other in the same way as described above taking into account their ion type tags, and the number of common cleavage sites is the discriminating parameter. The major difference with the above data- base-dependent comparison is that here all
M values are given a positive sign. This is because in the database-independent approach it may be difficult to know which peptide in the pair is unmodified; therefore, the lighter peptide is always selected as the base one.
| RESULTS |
|---|
|
|
|---|
A431 Sample
Fig. 3 shows the
M histogram acquired with n = 4 and plotted with a 5-mDa resolution for the region from 100 Da to +100 Da. The two insets show the compressed regions stretching in both directions by 1400 Da. The main histogram contains a number of distinct peaks. The high mass accuracy and resolution of FTMS resolves the peaks and assigns to them unique elemental compositions. Main peaks and their assignments are listed in Table I. The peak abundance is proportional to the abundance of the corresponding modification in the sample; thus the
M histogram represents the modification spectrum. Some of the peaks are doublets as an inset around +28 Da shows. The lighter peak G is mainly due to +CO contribution (formylation), whereas the heavier peak H is due to +C2H4 (ethylation or dimethylation).
|
|
M due to mutation (
M = 1.9979 Da) and due to loss of H2 (
M = 2.01565 Da). The total number of different types of modifications detected is hard to estimate because besides the peaks marked in Fig. 3 even peaks with a single count may mean a modification. Given that there are a total of 651 counts in the histogram excluding the 10 peaks marked A through J and assuming on average three counts per peak, we conclude that the histogram contains at least 217 unique modifications. Such a number of types of modifications is much larger than previously reported for a single sample, but it is not unexpected given the huge complexity of the human proteome. As reported above, some of the observed mass differences (modifications) are due to several modifications simultaneously present in the same peptide sequence. This complicates the analysis, but only slightly, because a second ModifiComb pass can be performed with all dependent sequences found in the first pass regarded now as base peptides.
False Positive Rate
The estimate of the number of modification types is only valid if the histogram does not contain many false (random) counts. However, random counts should be distributed on the
M scale much more evenly than the insets in Fig. 3 show, indicating that the majority of counts in Fig. 3 are real. To evaluate the rate of false positives more precisely, all data on the base peptides of the A431 sample were replaced by the same number of different base peptides from the E. coli sample (see below), and the base-dependent peptide matching routine was repeated with the new dataset. Thus obtained peptide pairs may include "true" coincidence cases due to sequence homology; therefore, this method overestimates the actual false positive rate. In the region 100 to +100 Da, the
M histogram contained only 21 counts (as opposed to a total of 1279 counts in Fig. 3), and outside that region (and within the 1400 to +1400-Da window), there were 159 counts. This means that the above conclusion of hundreds of different types of modifications present in the sample was valid.
Reducing the value of n from four to three increased the number of false positives in the 100 to +100-Da region from 21 to 420 counts. This result highlights the importance of extensive sequence information obtained here through the use of complementary CAD and ECD fragmentation. On the other hand, the major peaks in the
M diagram increased by 1020 counts compared with an average background increase of 0.1 count per channel. Thus, channels containing one or two counts become unreliable indicators of the presence of modification, and therefore n = 3 is unacceptable for them. On the other hand, in the major
M peaks the signal/noise ratio has not deteriorated dramatically, meaning that n = 3 can be used for them to increase the sensitivity.
Note that the minimal acceptable value of n represents an implicit scoring threshold, and thus ModifiComb does not require explicit scoring unlike other algorithms (17). This removes the necessity of filtering ModifiComb output.
Multiplicity of Modifications
The detected modifications could be confined to a relatively few base peptides that are for some reasons prone to modifications or could be more or less homogeneously spread over many base peptides. To find out the actual situation, the modification multiplicity values (number of different dependent peptides per unique base peptide) were calculated for every base peptide. There were 824 cases with multiplicity 1, 36 with multiplicity 2, and four cases with multiplicity 3. As expected, the "one peptide-one modification" model dominated. Analysis showed that the base peptides with the highest multiplicity values came from the most abundant proteins.
Efficiency
Identification of each base/dependent pair of peptides means that another MS/MS dataset that has previously been ignored is now explained. Thus the total efficiency of the proteomic analysis increases by 3.5%, an increase comparable with the efficiency of full-length de novo sequencing. Because of the differences in the extent of modification, this increment depended upon the analyzed organism (see below) and was higher for complex organisms (humans) than for primitive ones (E. coli).
The efficiency of ModifiComb in terms of false negatives (misses of true, present modifications) was tested for methylations. A Mascot search with methylation as variable modifications that took much longer than the ModifiComb run produced the same number of hits and did not reveal any new methylated peptide.
The Role of
RT
The retention time histogram resembles that of
M (data not shown), but the RT resolving power of nanoflow HPLC is much lower than the mass resolution of the FTMS instrument. Nevertheless this resolution is often sufficient to provide fine details on the position of modification. For example, Fig. 4a shows at least two peaks separated on the
RT scale, both corresponding to the same
M value of 27.995 Da (formylation). The first peak, at approximately +180 scans, corresponds to formylation of a side chain of serine and threonine (70% of peptides contributing to the peak carry this modification), whereas the peak at
+600 scans corresponds to modification of the N terminus (85% of the contributing peptides). The methylation peak (Fig. 4b) also features two components, a sharp peak at
50 scans and a smaller, broader peak at
300 scans. The heterogeneity of methylation is considered below in more detail.
|
-amine on lysine, the imidazole group of histidines, the guanidine moiety of arginine, and the side-chain nitrogen of glutamine and asparagines. It is a permanent modification not readily reversible under physiological conditions.
The fact that different methylation sites form clearly separated peaks in the
RT spectrum (Fig. 4b) can be explored to study the heterogeneity of methylation sites. As an example of such a study, Fig. 5 shows the case of a base peptide, 175TATPQQAQEVHEK187, from triose-phosphate isomerase identified from an A431 human carcinoma cell lysate through Mascot search with a very high M-score of 76. Manual validation of both CAD and ECD spectra (Fig. 5A and inset therein) confirmed the identification, revealing complete cleavage coverage of the sequence. The peptide was detected at RT = 36.95 min, and the ModifiComb found two modified forms of this peptide eluting
3.5 and 5.0 min later (Fig. 5B) with the same
M = +14.015 Da corresponding to methylation. From the extracted ion chromatogram in Fig. 5B, the abundance ratio between these two species is 1:0.8. The
RT values (+240 and +345 scans, respectively) correspond to different components of the right-hand tail of the
RT distribution in Fig. 4b. Manual inspection of the CAD and ECD spectra of these two dependent peptides (Fig. 5, C, D, and insets therein) located the positions of methylation at Glu186 and His185, respectively. Note that ECD was instrumental in locating these modifications due to abundant c11 and c12 ions (b11 and especially b12 ions in CAD were unreliable because of the poor signal/noise and the presence of losses from them that made interpretation equivocal). It is hardly a coincidence that adjacent amino acids so different in their chemical properties are methylated, whereas no sign of modification is found on the proximate Glu183 residue.
|
M,
RT) dots can be put on a two-dimensional map to provide a total overview of the modification state of the sample. Such a (
M,
RT) map is shown in Fig. 6a. The
M scale consists of 5-mDa wide channels; to simplify the map and highlight the most abundant modifications, only channels containing at least two counts are displayed. The map contains a total of 908 dots corresponding to 45
M channels. The
RT trends for each modification are clearly discernable.
|
M channels, a clear testimony to the much lower M-complexity of the bacterium.
On the map, the E. coli modification pattern may resemble a reduced human pattern, but this resemblance is superficial. Fig. 7a shows the
M plots of both samples in comparison from which their differences are apparent. The product/moment correlation (Pearson correlation (30)) analysis confirms the dissimilarity of these two patterns (r = 0.45).
|
M Analysis
M analysis. Note that for the high repeatability of
M analysis it is not essential that the instrument picks up exactly the same pairs of peptides in each analysis nor is the difference in absolute retention times of peptides important. Of course, at different sample loads the signal to noise ratio in
M plots will change, and some peaks may change their abundance or even disappear if they are small. This, however, is of lesser importance for the correlation analysis as the product-moment correlation factor r picks up the overall similarity between the patterns.
Human Versus Human
Fig. 7d presents a comparison between different human cell lines, A431 and K562. The obtained correlation r = 0.49 is just a little higher than that between human and E. coli samples, indicating that ModifiComb
M analysis can be used for assessing modification states of the same organism. Detailed analysis of the differences in the modification states between these and other human cell lines will be reported separately. Here we just note that the two largest differences observed between the A431 and K562 cell lines are in the extent of methylation and loss of NH3.
Database-independent Search
As already mentioned, identification of the sequence of the base peptide is not essential for the ModifiComb algorithm. To test the performance in the blind mode, the human A431 cell line
M histogram with base peptides obtained through Mascot search was compared with the histogram where peptide pairs were found through blind search in which the sequence of the base peptide was not known. As already mentioned, in such a search there is no difference between the base and dependent peptide, and because of that, absolute
M values were plotted. The obtained
M histogram was compared with the respectively modified
M histogram from Fig. 7a. To make the comparison fair, the oxygen peak F was removed from the blind search spectrum (this peak is mainly due to oxidized methionine and is not prominent in the database-dependent spectrum because the Mascot search did not include methionine-oxidized peptides in the list of base peptides). The two normalized distributions look similar (Fig. 8). Indeed correlation analysis confirms the high degree of similarity between the two patterns (r = 0.97). Note that the similarity between the database search and blind search
M spectra is much larger than that between two analyses of independently prepared samples of the same protein mixture. Thus both blind and open eyed methods can be used for fingerprinting the sample through
M histograms or (
M,
RT) maps.
|
|
M histogram of the saliva peptide sample is shown in Fig. 9. The most abundant detected modification corresponds to
M = +12.000 Da, an unexpected value not found among common modifications (12). The mass shift can only correspond to a carbon atom. One of the contributing peptide pairs to that modification has already been shown in Fig. 1; the modification is located on a proline residue. Here we discuss the possible biological implications of this finding stemming from the fact that at least six base peptides related to this modification were identified as belonging to proline-rich proteins (PRPs). One of the primary functions of saliva PRPs in mammals is to precipitate tannins, polyphenolic compounds commonly found in certain beverages, fruits, and berries. Tannins exhibit a variety of harmful effects ranging from reduction of the nutritional value of food to causing esophageal cancer (32). Strong binding of tannins to PRPs is believed to be the first line of defense against these harmful compounds (33, 34). Transformation of the proline pyrrole five-member ring into a pyrrolidine six-member ring (+12.000 Da) makes the modified endogenous saliva peptides slightly more hydrophobic than the unmodified counterparts. Although the base and +12.000-Da peptides are shown in the same integrated spectrum in Fig. 1A, the chromatographic peak of the dependent peptide was delayed by 9 min compared with the base peptide. Because the main interaction between the tannins and PRPs is the hydrophobic stacking of the phenolic ring of a polyphenol over the pyrrole rings of the Pro residues (33, 35), an increase in the hydrophobicity of the latter makes the stacking interaction stronger, which may improve the defense against tannins. This modification may be similar in structure to the rare amino acid baikiain (36), which has not been reported previously in humans.
|
| CONCLUSION AND DISCUSSION |
|---|
|
|
|---|
M histogram reveals the overall modification pattern. Additionally
RT information confirms the nature of the modifications. A two-dimensional (
M,
RT) map demonstrates repeatable features for the same biological sample independent from the method of map generation (database or blind search). On the other hand, the maps are different for different organisms and samples, which may be used for comparative proteomic work and searching for biomarkers.
The (
M,
RT) map immediately reveals the PTMs present in the sample without making a priori assumptions of their chemical composition or the site of attachment. All abundant present modifications are detected, including ones that are unexpected and novel. The sensitivity of ModifiComb should be higher than that of the database search with variable modifications as ModifiComb separates the function of the peptide identification from the function of the PTM assignment and is usually satisfied with lower information content from MS/MS data of modified peptides (at least four fragments) than is required for reliable database identification of unmodified peptides (at least six to seven fragments). The speed of ModifiComb is much faster than that of any database search engine. The information in the PTM spectrum derived from the sample can be used to minimize the number of modifications allowed as variable parameters in the subsequent database search thereby speeding it up as well as reducing the rate of false positive and false negative identifications.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, January 25, 2006, DOI 10.1074/mcp.T500034-MCP200
1 The abbreviations used are: PTM, post-translational modification; CAD, collisionally activated dissociation; ECD, electron capture dissociation; RT, retention time;
RT, retention time difference;
M, mass difference; PRP, proline-rich protein. ![]()
* This work was supported by the Knut and Alice Wallenberg Foundation and Wallenberg Consortium North Grant WCN2003-UU/SLU-009 (to R. Z.) as well as Swedish Research Council Grants 621-2004-4897 and 621-2003-4877 (to R. Z.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
Both authors contributed equally to this work. ![]()
To whom correspondence should be addressed: Laboratory for Biological and Medical Mass Spectrometry, Uppsala University, Box 583, S-75123 Uppsala, Sweden. Tel.: 46-18-471-5729; Fax: 46-18-471-5729; E-mail: Mikhail.Savitski{at}bmms.uu.se
| REFERENCES |
|---|
|
|
|---|
antibodies.
Anal. Chem.
77, 6004
6011[Medline]
-N-Methyl-lysine in bacterial flagellar protein.
Nature
184, 56
57[CrossRef][Medline]