|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 5:998-1005, 2006.
© 2006 by The American Society for Biochemistry and Molecular Biology, Inc.
,
,

,
,
,**
From the
Laboratory for Biological and Medical Mass Spectrometry, Biomedical Centre, Box 583, Uppsala University, SE-75123 Uppsala, Sweden, the
Department of Pharmaceutical Biosciences, Biomedical Centre, Uppsala University, SE-75124 Uppsala, Sweden, and || The Rockefeller University, New York, New York 10021
| ABSTRACT |
|---|
|
|
|---|
The study of endogenously processed peptides has been termed "peptidomics" (3). Peptidomics complements molecular biological approaches in its ability to characterize the processing of functional gene products. It allows direct observation of changes in the amount of peptides and small proteins and their post-translational modifications. The main difficulties in the analysis of endogenous peptides are their rapid degradation during extraction and purification (4) and that their average tissue content is less than 0.1% of that of proteins (5). Endogenous peptides also often contain post-translational modifications (PTMs)1 (e.g. acetylation, amidation, and phosphorylation), adding to the difficulty of deciphering the obtained mass spectra.
An important functional group of the peptidome is the endogenous peptides in the brain. The neuropeptides range in length from 3 to 100 amino residues and are up to 50 times larger than classical neurotransmitters (6). The neuroactive peptides are derived from the processing of secretory proteins that are formed in the cell body on polyribosomes attached to the cytoplasmic surface of the endoplasmic reticulum. They are then processed in the endoplasmic reticulum and moved to the Golgi apparatus for further processing. In the central nervous system, most neurons contain biologically active peptides together with classical neurotransmitters. Neuropeptides are implicated in the pathology of various neurological and psychiatric disorders such as depression, neurodegenerative diseases, and eating and sleeping disorders (2).
Despite their biological and physiological importance there is at the moment a lack of easily accessible information in the public databases regarding endogenous peptides, making it difficult to identify the endogenous peptides from complex samples. MS in combination with two-dimensional gel electrophoresis or LC has become the main tool in proteomics for the identification of peptides and proteins and typically generates large sets of data (7). By using a search engine, the data are compared with protein sequence collections such as UniProt Knowledgebase (8) or the non-redundant (nr) protein sequence collection from the National Center for Biotechnology Information (NCBI). These protein sequence databases also offer additional information, including brief functional descriptions (if available), an annotation of sequence features (e.g. modifications), secondary and tertiary structure predictions, key references, and links to other databases. Lately a number of databases have become more oriented against specific proteomic subareas (5, 9, 10). Although several of these databases are well organized and easy to use, they do not always fulfill all new demands. At present there is no searchable database specifically designed for identification of endogenous peptides.
In the present study we have developed a database for endogenous peptides and small proteins below 10 kDa. The database consists of biologically active peptides such as classical neuropeptides and hormones, potential biologically active peptides, and uncharacterized peptides. Several examples on improved neuropeptide identification utilizing SwePep and MS are demonstrated.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Data Model
The SwePep database is implemented as a relational database (13) using an MySql database management system (11). SwePep is specifically designed for endogenous peptides. Every peptide in the database is connected to the following information: name, sequence, precursor protein, position in precursor sequence, modifications, location, organisms, reference, mass, and pI. The database is designed to minimize the data redundancy. Therefore some objects are split into two or more tables that are connected to each other, e.g. peptide and peptide type. This way the peptide sequence, mass, name, and pI are only stored once in the database even though the peptide occurs many times in different precursors (Fig. 1).
|
Currently SwePep consists of 4180 unique endogenous peptides, and many of these are post-translationally modified. So far,
100 neuropeptides have been experimentally identified from brain tissue in our laboratory. The neuropeptides in SwePep have been derived from 1643 precursor proteins from 394 different species. All peptides have searchable descriptors such as mass (monoisotopic and average), modifications, precursor information, and organism affiliation. Because the experimental data contain peptides and proteins in the mass range up to 10 kDa, the SwePep database also contains 25,047 small proteins with sequence length less than or equal to 120 amino acids. This makes it possible to identify more of the contents in experimental samples. The current state of the number of peptides in the SwePep database is shown in Table I.
|
The SwePep database is also populated with novel peptides from brain tissue identified in our laboratory from different species. For this data set SwePep also contains information about the experimental conditions such as sample information (i.e. species and treatment), mass spectral raw data, and processed data.
Classification of Peptides in SwePep
To ensure that the information in the SwePep database is reliable, all peptides that are stored in SwePep are sorted into three different classes: (i) biologically active peptides, (ii) potential biologically active peptides, and (iii) uncharacterized peptides.
Biologically Active Peptides
This group of peptides contains the classical neuropeptides, such as substance P, neurotensin, enkephalins, and dynorphins, that are present in a neuron together with classical neurotransmitters. This group also contains peptides functioning as hormones, a class of peptides that are secreted into the blood stream to exert endocrine functions. All the neuropeptides and hormones in this group have known biological functions.
Potential Biologically Active Peptides
This group contains pharmacologically uncharacterized peptides (between 3 and 100 amino acids) that potentially are biologically active. They are identified in tissues or body fluids, which have been instantly proteolytically deactivated postmortem or postsampling, and have characteristics similar to the neuropeptides and hormones, i.e. they have specific convertase processing sites (15). Modifications such as amidation of the C terminus and N-terminal acetylation are regarded as important criteria because many bioactive peptides are amidated by conversion of a C-terminal glycine to a carboxamide.
Uncharacterized Peptides
Peptides that do not fulfill the criteria of the groups above belong to this group. Among others, this group consists of peptides from samples not rapidly proteolytically deactivated postsampling.
Sample Preparation and Mass Spectrometry Analysis
Rats (Sprague-Dawley) and mice (C57/BL6) were sacrificed as previously described (4) (Murimachi Kikai, Tokyo, Japan). The brain regions of interest were thereafter rapidly dissected out and stored at 80 °C. The brain tissue was suspended in cold extraction solution (0.25% acetic acid) and homogenized by microtip sonication (Vibra cell 750, Sonics & Materials Inc., Newtown, CT) to a concentration of 0.2 mg of tissue/µl. The suspension was centrifuged at 20,000 x g for 30 min at 4 °C. The protein- and peptide-containing supernatant was transferred to a centrifugal filter (Microcon YM-10, Millipore, Bedford, MA) with a molecular mass limit of 10,000 Da and centrifuged at 14,000 x g for 45 min at 4 °C. Finally the peptide filtrate was immediately frozen and stored at 80 °C until analysis. The peptide extract was separated using on-line nanoflow reversed phase capillary liquid chromatography (Ettan MDLC, GE Healthcare, Uppsala, Sweden) and analyzed with ESI-MS using a Q-TOF (Waters) or Finnigan LTQ or LTQ-FT (Thermo Electron, San Jose, CA) mass spectrometer (4).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
By classifying the peptides in SwePep into three different classes: (i) biologically active peptides, (ii) potential biologically active peptides, and (iii) uncharacterized peptides, it is possible to store peptides and protein fragments not proven to be biologically active. Peptides that belong to the group of potential or uncharacterized peptides are moved to the group of biologically active peptides if they demonstrate biological activity.
Previous studies suggest that the process of protein elimination and degradation should not be considered a random proteolysis yielding free amino acids subsequently utilized for various metabolic purposes. Instead it should be regarded as a complex process regulated by a system of tissue-specific enzymes and protein substrates. These peptides, complementary to the conventional regulatory systems, may be considered as another concept of a peptidergic regulatory system, giving rise to a large group of peptides, which are defined as tissue-specific peptide pool (18). For example, hemorphins are small peptides generated by enzymatic hydrolysis of hemoglobin or blood (1921). Their physiological functions are discussed because they are found in a variety of mammalian tissues and fluids (2227). Hemorphin peptides were previously found in brain tissue not proteolytically deactivated (28) but were not detected in tissue that had been proteolytically deactivated (4). However, hemorphins are claimed to have biological activity and produce constriction of coronary vessels and platelet aggregation (24) and to inhibit angiotensin-converting enzyme activity (29).
Searching for Peptide Identities Using SwePep
The whole peptide identification procedure, starting with experiment and ending with a list of identified peptides, is shown in Fig. 2. When using SwePep for peptide identification, the user typically starts by selecting a mass tolerance based on the mass accuracy of the mass spectrometer used in the peptide analysis. A file containing the experimental peptide masses are then matched against theoretical calculated masses in the database. The matching is performed both with and without annotated PTMs. It is also possible to add non-annotated modifications to the peptides to investigate other possible modifications.
|
Example: Neuropeptide Identification in Hypothalamus
We have developed a new approach to study a large number of neuropeptides and used it for an investigation of the endogenous neuropeptide content of hypothalamic brain tissue samples from rat (4, 28). The MS data of the neuropeptide and small protein content in the hypothalamus were analyzed by an automated software program method for processing the results (DeCyder MS, GE Healthcare). The generated mass list consisting of deconvoluted mass data was matched against the SwePep database for neuropeptide matches. The absolute mass difference between the theoretical and experimental mass was selected not to exceed 0.2 Da for a match to be valid. All positive matches were recorded, and subsequent analysis was performed on experimental data pertaining to these matches to streamline the identification procedure. The final validation step was either manual inspection of tandem mass spectra, searching of sequence collections with tandem MS data for peptide identification, or a combination of both.
From hypothalamic mouse brain tissue DeCyder MS detected
400 specific peptide masses. SwePep suggested 54 neuropeptide candidates, and of these, 31 neuropeptides were verified by tandem mass spectrometry (Table II)
|
Post-translational Modifications in SwePep
It is an important task to characterize all modifications for understanding of the biological function and the regulations of the peptides. Unfortunately it is both time-consuming and difficult to fully characterize peptides and proteins with respect to their modifications. Important modifications include acetylation, amidation, phosphorylation, and sulfation (2), and
300 different modifications have been reported for proteins (30). For example, 5090% of eukaryotic proteins synthesized in the cytoplasm are isolated with their N termini acetylated (31), including the opioid neuropeptide dynorphin that is acetylated after it has been cleaved from its larger precursor (32). It is also estimated that about 30% of mammalian proteins are phosphorylated (33).
Furthermore disulfide bonds are frequent modifications among peptides. Because of the small size of the peptides, disulfide bonds provide the necessary constraints for the peptides to have a well defined three-dimensional structure. This adds another level of complexity because many disulfide-linked peptides remain intact in tandem MS as mentioned above (34). The fact that endogenous peptides often are modified is also reflected in SwePep where the majority of the peptides are modified, e.g. 122 of the 195 peptides found in mouse have annotated modification, and 58 of the 122 have more than one annotated modification. By having information about modifications and thereby taking into account possible changes in the molecular mass, identification of modified peptides is easier.
In the example above analyzing the hypothalamic brain tissue we could identify a number of neuropeptides with different PTMs using SwePep. Several of the identified neuropeptides, such as corticotropin-lipotropin intermediary peptide (CLIP) and substance P, had C-terminal amidation. N-terminally acetylated stathmin was identified as well as gonadoliberin I with both a pyrrolidone carboxyl acid and C-terminal amidation. Additionally a phosphorylated (at Ser14) and non-phosphorylated form of CLIP was also identified. Searching the SwePep database for peptides matching the experimental peptide masses 2505.01 and 2585.23 Da with a mass accuracy of 0.2 Da generated one matching peptide for each of the two masses. The suggested identities were Arg-CLIP and the phosphorylated species of Arg-CLIP. The identities were confirmed by tandem mass spectrometry. Some of these neuropeptides would have been difficult to identify without the suggested identity from SwePep.
Accurate Mass Identification of PEP-19 Using ESI FT-ICR MS
In a proteomic study of an animal model of Parkinson disease, we observed a decreased level of a 6.7-kDa peptide in mouse striatum using nano-LC ESI Q-TOF MS (35). Subsequent accurate mass data of the protein were acquired using nano-LC ESI LTQ-FT, and the MS data were compared with the SwePep. Because the mass accuracy of the LTQ-FT mass spectrometer is specified to less than 2 ppm by the manufacturer using external calibration (36), all possible peptide matches in the database were ensured by limiting the search to 10 ppm. Two matches corresponding to the molecular mass of the peptide were retrieved from the search, i.e. acetylated PEP-19 (mass, 6714.2604 Da) from mouse/rat and small venom protein 1 precursor (mass, 6714.2433 Da) from parasitoid wasp. The mass was calculated from the most intense charge state at m/z 747.0338 (Fig. 3). The suggested identity of the protein was also confirmed to be acetylated PEP-19 by tandem MS.
|
Recently we were able to identify a number of novel endogenous peptides from rat hypothalamus. Moreover post-translational modifications of some of these novel peptides were also identified. These novel peptides from rat hypothalamus have been added to SwePep. We also have identified and added an additional 30 novel peptides from various regions in the mouse and rat brain to SwePep. The identities of these peptides will be published separately. Our technology, which includes instant deactivation of processing enzymes in the brain and highly sensitive MS analysis (4), may contribute to additional identification of novel biologically active neuropeptides, which will be added to the SwePep database.
Concluding Remarks
We have developed a novel database for endogenous peptides, SwePep, that contain approximately 4200 endogenous peptides, hormones, potential neuropeptides, and uncharacterized peptides from 394 different species to facilitate and improve endogenous peptide identification utilizing MS. A light version of the SwePep database is accessible through the internet, www.swepep.org. The website will grow continuously. It is possible to search for peptides according to mass, name, organism affiliation, UniProt accession number, or a combination of them. The result of the search contain detailed information about the peptide such as precursor name, precursor sequence, peptide name, mass, sequence, peptide function, and references.
| FOOTNOTES |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Published, MCP Papers in Press, February 26, 2006, DOI 10.1074/mcp.M500401-MCP200
1 The abbreviations used are: PTM, post-translational modification; CLIP, corticotropin-lipotropin intermediary peptide; GPCR, G-protein-coupled receptor; LTQ, linear trap quadrupole; UniProt, Universal Protein Resource; XML, extensible markup language. ![]()
* This study was sponsored by Swedish Research Council (VR) Grants 621-2004-3417 and 521-2002-6116, an institutional grant from the Swedish Foundation for International Cooperation in Research and Higher Education (STINT), the K&A Wallenberg Foundation, and the Karolinska Institutet Centre for Medical Innovations, Research Programme in Medical Bioinformatics. ![]()
** To whom correspondence should be addressed. Tel.: 46-18-471-7206; Fax: 46-18-471-4422; E-mail: per.andren{at}bmms.uu.se
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Falth, K. Skold, M. Svensson, A. Nilsson, D. Fenyo, and P. E. Andren Neuropeptidomics Strategies for Specific and Sensitive Identification of Endogenous Peptides Mol. Cell. Proteomics, July 1, 2007; 6(7): 1188 - 1197. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |