|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,





From the
UMR102 INRA/ENESAD, Genetics and Ecophysiology of Grain Legumes, F-21000 Dijon, France, ¶ Genomics of Legume Plants, Institute for Genome Research and Systems Biology, Center for Biotechnology, Bielefeld University, D-33594 Bielefeld, Germany, || UMR6175 INRA, Mass Spectrometry Platform for Proteomics, F-37380 Nouzilly, France, and the 
Unit Unité de Biochemie et Structure des Protéines INRA, Mass Spectrometry Platform for Proteomics, F-78352 Jouy-en-Josas, France
| ABSTRACT |
|---|
|
|
|---|
During seed filling, the young seed coat supports storage compound synthesis in the filial tissues by transmitting organic nutrients from the phloem, mainly sugars, glutamine, and asparagine (5). Phloem sulfur supply in the form of sulfate, S-methylmethionine (SMM)1 and glutathione, also influences significantly seed composition (6, 7). The morphology and composition of the filial organs vary greatly among species. In mature seeds of the Gramineae, the endosperm serves as the major filial storage tissue, rich in starch but with a protein content of less than 16%. In contrast, the principal storage organ of grain legumes is the cotyledon with protein contents ranging from 20% to as much as 40%, and the mature seed has little endosperm tissue. Like the major storage proteins of barley and maize grains (prolamin, glutelin), the predominant proteins of legume seeds (legumin, vicilin) are low in tryptophan (<1%) and in the sulfur-containing amino acids cysteine and methionine (<1.5%). The nature of the storage proteins that determine the amino acid composition of the total protein fraction present in seeds is genetically programmed. However, their rate of accumulation also depends on nutrient availability during seed filling (8). Because legume and cereal seeds are major human and livestock food sources, much research and breeding effort are concentrated on optimizing their nutritional value.
Although separate transcriptome and proteome analyses of developing seeds of legumes (9–11), Arabidopsis (12), and cereals (13–17) have proven invaluable in identifying changes in expression during seed filling, new strategies are needed which, by comparing transcript and protein patterns, can distinguish protein accumulation driven directly by transcript abundance from that post-transcriptionally regulated. Such proteome-transcriptome comparisons have recently been reported for human, animals, bacteria, archaea, and yeast (18–23) and are also becoming feasible in plants, especially where extensive genomic or expressed sequence information is available, such as Arabidopsis (24, 25).
Our objective was to perform a comparative study of the proteome and transcriptome at both spatial and temporal levels during seed development. We have chosen to use the annual barrel medic Medicago truncatula, a model legume characterized by a process of seed development very similar to that of other legumes, the only notable difference being a layer of endosperm remaining at maturity (26, 27). Six stages spanning important developmental phases (early seed filling to desiccation) and the three major seed tissues (seed coat, endosperm, embryo), isolated at the switch to storage product accumulation, were analyzed by proteomics and transcriptomics. Such seed tissue analyses have been performed in tomato seeds (28) or in cereal grains (14, 15, 29–31) but have not been reported for a legume species. Comparisons of proteome and transcriptome data provided novel indications as to which processes related to seed development are regulated at the level of the transcriptome and which are controlled at the proteome level. Furthermore, the seed tissue analysis revealed the partitioning of metabolic pathways between filial and maternal tissues. In particular, there is a remarkable compartmentalization of Met metabolism, which is of agronomic interest with respect to legume seeds. Complementary information derived from the transcriptomics dataset was used 1) to further investigate whether differential expression between tissues is specific to Met biosynthetic enzymes or holds more generally for other amino acids, and 2) to identify candidate genes with possible roles in the transfer of amino acids and other nutrients between the seed compartments.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
4000 seeds (5 g of seeds at the 12 dap reference stage and 1 g of seeds for each of the five developmental stages ranging from 14 to 36 dap) were collected per biological replicate on a set of 10 plants for the time course analysis. For seed tissue analyses, a total of 250 seeds were collected per biological replicate on a set of 10 plants. Seed coat, endosperm, and embryo of the freshly harvested seeds were manually separated under a magnifying glass (magnification, x3.5) on Petri dishes placed on ice. Once isolated, seed tissues were weighed and immediately frozen in liquid nitrogen. From the five remaining plants of the two biological replicates, flowers, leaves, stems, and roots were collected at flowering. All seed and plant tissue samples were separately ground in liquid nitrogen using mortar and pestle. The powder was stored at –80 °C until mRNA and/or protein extraction. A third batch of plants (third biological replicate) was grown in another year under the conditions described above. A more restricted quantity of seed and tissue samples was collected from this third experiment (0.3 g per sample) to further validate the data obtained via a set of genes (see "qRT-PCR and SYBR Green Detection").
Isolation of Total RNA, Labeling, Microarray Hybridization, and Data Acquisition/Analysis—
The microarray experiments were performed using seeds of six stages ranging from 12 to 36 dap and three seed tissues (seed coat, endosperm, and embryo), each collected on two batches of plants grown independently in two subsequent years to provide two biological replicates. Four technical replicates were performed per biological replicate, including a dye swap. The Mt16kOLI1 microarrays used in this study consist of a set of 16,086 70-mer oligonucleotides (Medicago Genome Oligo Set, version 1.0, Operon) representing all tentative consensus sequences (TCs) from the Institute for Genomic Research M. truncatula Gene Index 5 (TIGR MtGI). These 70-mer oligonucleotide probes were printed in two duplicates according to Hohnjec et al. (32). Total RNA was extracted from 14 dap seed tissues (seed coat, endosperm, and embryo) and from whole seeds collected at 12, 14, 16, 20, 24, and 36 dap, with the phenol/SDS method described by Verwoerd et al. (33). As references for co-hybridizations on the microarray slides, one batch of embryo RNA and one batch of 12 dap seed RNA were prepared per biological replicate. The 12 dap stage was chosen as a common reference for the time course because it precedes the synthesis of the storage products (Fig. 1). For each hybridization, 20 µg of total RNA was reverse-transcribed, purified by ultrafiltration (Microcon YM-30, Millipore), and used to synthesize first strand cDNA targets via incorporation of aminoallyl-dUTP as described by Küster et al. (34). Reverse transcription efficiency was checked using 2 µl of the reaction mixture on 1% (w/v) agarose gels after ethidium bromide staining. Since a low quantity of cDNA was observed for 14–16 dap seeds and seed coat, presumably because of the presence of polyphenolic compounds in seed coat impeding reverse transcription, the RNA extraction procedure described by Heim et al. (35) was successfully applied to these samples. The cDNAs were subsequently labeled with Cy3- or Cy5-N-hydroxysuccinimide esters and purified. Labeling efficiency was checked by scanning separated Cy-coupled targets on 1% (w/v) agarose gels with a Typhoon scanner (GE Healthcare/Amersham Biosciences). The embryo reference sample was co-hybridized with the samples derived from seed coat or endosperm. For the time course analysis, the 12 dap reference sample was co-hybridized with the samples derived from the different stages of seed development. Hybridization in the Automatic Slide Processor station (GE Healthcare/Amersham Biosciences) and manual washing steps of the Mt16kOLI1 microarray slides were performed as described by Küster et al. (34). Dried slides were scanned with a pixel size of 10 µm at optimal settings using a Scanarray 4000 (PerkinElmer Life Sciences).
|
2-fold background (R
1 for both channels) were automatically flagged "empty." ImaGene output files were imported into the ArrayLIMS and EMMA (2.0) microarray analysis software tools (36). During import, spots flagged empty or "poor" (flag value 2 and 1) were removed. After local background subtraction and after applying a floor value of 20, the resulting signal intensities were used for data normalization using a local regression (Lowess) procedure applied globally. Subsequently, M values (log2 intensity ratios) and A values (average signal intensities) were calculated according to Dudoit et al. (37). Genes significantly up- or down-regulated were identified based on t-lists obtained from EMMA (Supplemental Table 1). Genes were regarded as being differentially expressed if signals were detectable on more than 66% of replicated spots along with p < 0.05 and –1
M
1 (i.e. 2-fold regulation) for statistical significance. Genes were manually annotated based on automatic BLAST hits of oligo corresponding TIGR MtGI TCs in the curated databases TrEMBL, TrEMBLnew, Protein Information Research, and Swiss-Prot as well as in the Medicago EST Navigation System database. A manual assignment of the seed-expressed genes to functional classes was performed according to the MapMan scheme (38) as described in Supplemental Table 1. Gene expression profiles selected for example by functional class were hierarchically clustered using the Cluster 3.0 software (39) via the average linkage method using an uncentered correlation and visualized using the Java TreeView (1.0.12) software. Data (from ImaGene output files) were submitted to ArrayExpress under accession numbers E-MEXP-904 and E-MEXP-907.
Quantitative RT-PCR and SYBR Green Detection—
Verification of differential gene expression was performed by SYBR qRT-PCR from seed and tissue samples collected on three series of plants (three biological replicates) at the stages analyzed by transcriptomics and from additional stages (8, 10, and 44 dap) and plant tissues (flowers, roots, leaves, and stems). Total RNAs were extracted by using the method described by Chang et al. (40). RNAs (20 µg) were incubated in presence of 20 units of RNase-free RQ1 DNase (Promega, Madison, WI). Non-reverse-transcribed RNA samples were checked for absence of contaminating genomic DNA by PCR using primers for the constitutively expressed msc27 gene (TC85211) (41). Samples were reverse-transcribed using the iScript cDNA synthesis Kit (Bio-Rad, Hercules, CA), and diluted in a final volume of 1 ml. Primers were designed preferably in 3' regions of the genes to amplify fragments of 50–150 base pairs (Supplemental Table 2). The qRT-PCR reactions were carried out in duplicate in a Bio-Rad iCycler in a final volume of 25 µl containing 5 µl of diluted cDNA, 200 nM of each primer, and 12.5 µl of iQ SYBR Green Supermix for 2 min at 95 °C, 40 cycles of 20 s at 95 °C, 20 s at 60 °C, and 30 s at 72 °C. To establish the presence of a single PCR product and the absence of primer dimers, melting analysis (i.e. heat dissociation of oligonucleotides) was applied immediately after PCR by heating PCR products from 59 to 96 °C. Relative gene expression was calculated according to the relative standard curve method (
CT) using M. truncatula msc27 as a constitutively expressed gene. The expression stability of the msc27 gene in the different test samples was verified by comparison with two other constitutively expressed genes encoding
-tubulin (TC81141) and the eukaryotic translation initiation factor 5A-2 (TC76568) (data not shown). As a control for the data obtained, we additionally performed digital expression profiling via the expression summary tool from the TIGR MtGI 8, allowing prediction of global expression profiles based on the percentage of ESTs corresponding to a given gene in different cDNA libraries.
Total Protein Extraction and Two-dimensional Electrophoresis—
Total proteins were extracted as described by Gallardo et al. (9) from each of the seed stages, and tissues were analyzed in parallel with the Mt16kOLI1 microarrays, i.e. from seeds ranging from 12 to 36 dap and seed tissues at 14 dap, all collected on two series of plants grown independently in two subsequent years. From each of these two biological replicates, two independent protein extractions were performed, and two replicated two-dimensional gels were prepared from each protein extract (four technical replicates per biological replicate). Total proteins were extracted from developing seeds in 20 µl/mg of seed dry matter of thiourea/urea lysis buffer (see the corresponding seed fresh weight in Fig. 1A and Ref. 42). Total proteins of seed tissues were extracted in 2 µl of the same buffer per mg of seed fresh weight. Protein concentration was measured according to Bradford (43). Proteins were first separated by IEF using a constant volume (20 µl) of the protein extracts from developing seeds and 150 µg of proteins from seed coat, embryo, and endosperm. Proteins were separated using gel strips forming an immobilized nonlinear pH 3 to 10 gradient (Immobiline DryStrip, 24 cm; GE Healthcare/Amersham Biosciences), allowing an accurate visualization of the M. truncatula seed proteome by minimizing overlaps. Strips were rehydrated in the IPGphor system (GE Healthcare/Amersham Biosciences) for 7 h at 20 °C with the thiourea/urea lysis buffer containing 2% (v/v) Triton X-100, 20 mM DTT, and the protein extracts. IEF was performed at 20 °C in the IPGphor system for 7 h at 50 V, 1 h at 300 V, 2 h at 3.5 kV, and 7 h at 8 kV. Prior to the second dimension, each gel strip was incubated at room temperature for 2 x 15 min in 2 x 15 ml equilibration buffer as described in Gallardo et al. (44). Proteins were separated in vertical polyacrylamide gels according to Gallardo et al. (44).
Protein Staining and Image Analysis—
Gels were stained with Coomassie Brilliant Blue G-250 (Bio-Rad) according to Mathesius et al. (45). Image acquisition was done using the Odyssey Infrared Imaging System (LI-COR Biosciences, Lincoln, NE) at 700 nm with a resolution of 169 µm. Image analyses were carried out with the ImageMaster 2D Platinum version 5.0 software (GE Healthcare/Amersham Biosciences) according to the instruction manual. After matching the spots detected during the time course, a synthetic gel was created, allowing the visualization of all the polypeptides. This composite reference map was then used for protein pattern comparison during the time course and for matching with two-dimensional gels from the seed tissues. An attempt was made not to include spots where overlap with other spots was readily apparent.
Spot Volume Normalization—
We have normalized the volume of each spot (i.e. spot abundance) to total spot volume in each gel for the three seed tissues. In the context of the time course of seed development, this method of normalization is problematic because the storage proteins accumulate to 70% of total spot volume, and the proportion they represent of the total proteins changes drastically during seed filling. Therefore, this method of normalization approximates spot abundance relative to storage protein concentration. To circumvent this problem, we have used the scaling procedure described previously (9, 42), which involves the normalization, in each two-dimensional gel, of the volume of each spot to the volume of a set of housekeeping proteins, which showed little variation in intensity during seed development. This method of normalization allowed a reliable comparison with the microarray data, whose normalization is based on the expression ratios of the total number of probes using a Lowess procedure, with probes corresponding to the storage protein genes only representing a very small fraction of the total number of probes (less than one per thousand) and the corresponding signal representing a vanishingly small proportion of the total.
Statistical Analyses of Protein Variations—
For each spot, the quantitative data obtained during the time course were submitted to a one-way analysis of variance using the SAS software (46). Then, a Dunnett's t test was performed to compare each stage of the time course (14, 16, 20, 24, and 36 dap) to the 12 dap stage used as reference in the transcriptomics experiments. Statistically significant differences (with 95% confidence intervals) in the quantities of individual protein spots as compared with the reference stage were identified (Supplemental Table 3). Similarly, for each spot, the quantitative seed tissue data were submitted to a one-way analysis of variance followed by a Dunnett's t test to compare the seed tissues (seed coat, endosperm) to the embryo used as reference in the transcriptomics experiments. Statistically significant differences (p < 0.05) in the quantities of individual protein spots as compared with the embryo were identified (Supplemental Table 3).
Protein Identification and Comparison with Transcriptomics Datasets—
Data are reported for 224 protein spots identified by MS. Of these 224 proteins, 56 were previously identified by MALDI-TOF MS (9). In the current study, the identity of 31 proteins excised from two-dimensional gels derived from whole seeds was obtained by MALDI-TOF MS (Voyager DE super STR, Applied Biosystems, Foster City, CA) equipped with a nitrogen laser emitting at 337 nm. The excised gel plugs were destained in 50% acetonitrile and 50 mM ammonium bicarbonate (v/v). After gel-drying for 30 min, the digestion was performed in 25 µl of 50 mM ammonium bicarbonate (pH 8.0) with 0.5 µg of modified trypsin (sequencing grade, Promega, Charbonnières-les-Bains, France) for 16 h in a thermomixer (Eppendorf, Le Pecq, France) at 37 °C with vortexing at 500 rpm. One microliter of supernatant was mixed on the stainless steel MALDI plate with 1 µl of
-cyano-4-hydroxycinnamic acid (Sigma-Aldrich, Saint Quentin Fallavier, France) at 4 mg/ml in acetonitrile/TFA (50:50; v/v) 0.3% and dried at room temperature. Spectra were recorded in positive reflector mode with 20 kV as accelerating voltage, a delayed extraction time of 130 ns, and a 62% grid voltage. Mass spectra were treated by Data Explorer 4.2 (Applied Biosystems) with the following parameters: noise filter/smooth (noise removal of 2), 0.5% base peak intensity, 0.5% maximum peak area, resolution-dependent settings option, and peak resolution of 10,000. Spectra were deisotoped and internally calibrated using the autolytic trypsin fragments characterized by [M+H]+ 842.509 and 2211.104 Da. Spectral profiles were collected in the mass range 750–3500 Da. The known cluster matrix masses (47), trypsin auto-cleavable peptides, and human keratin masses (48) were removed for database searches. A search in the MtGI database (226,923 ESTs assembled in 18,612 TCs and 18,238 singletons, from 55 cDNA libraries, last release: 8.0 January 19, 2005) and in the M. truncatula translated sequences downloaded from NCBI was done using the MS-FIT program in local version (Protein Prospector v3.2.1). All peptide masses were assumed to be monoisotopic and protonated molecular ions [M+H]+. The MS-FIT search parameters were trypsin specificity, two missed cleavages, 30 ppm mass accuracy, carbamidomethylation, and other modifications (N-terminal pyroglu, oxidation of Met, protein N terminus acetylated). A match was considered significant when the sequence coverage was at least 20% with more than four nonoverlapping peptides. In most cases, the molecular weight search score was above 1e+003 (the lowest score was 297), and the numbers of peptides identified was high, with an average of 10 peptides per protein hit. The spectrums, scores, percent matches, percent sequence coverage, peptide masses, and mass errors are provided in Supplemental Table 3.
Subsequently, a total of 137 additional spots, including those that could not be identified unambiguously by MALDI-TOF and spots derived from the separated seed tissues and/or from the whole seeds, were subjected to nano-LC-MS/MS sequencing. Protein spots were excised from two-dimensional gels and reduced with 10 mM DTT for 45 min at 56 °C, alkylated with 55 mM iodoacetamide for 30 min in the dark at room temperature, and incubated overnight at 37 °C with 12.5 ng/µl trypsin (sequencing grade; Roche Applied Science) in 25 mM ammonium bicarbonate. The tryptic fragments were extracted, dried, reconstituted with 2% (v/v) acetonitrile and 0.1% formic acid, and sonicated for 10 min. Tryptic peptides were sequenced by nano-LC-MS/MS (Q-TOF-Ultima Global equipped with a nano-ESI source coupled with a Cap LC nanoHPLC, Waters Micromass, Waters, Saint Quentin en Yvelines, France) in the Data Dependent Acquisition mode allowing the selection of three precursor ions per survey scan. Only doubly and triply charged ions were selected for fragmentation over a mass range of m/z 400–1300. A spray voltage of 3.2 kV was applied. The peptides were loaded on a C18 column (Atlantis dC18, 3 µm, 75 µm x 150 mm Nano Ease, Waters) and eluted with a 5–60% linear gradient with water/acetonitrile 98/2 (v/v) containing 0.1% formic acid (buffer A) and water/acetonitrile 20/80 (v/v) containing 0.1% formic acid (buffer B) over 30 min at a flow rate of 200 nL/min. MS/MS raw data were processed (smooth 3/2 Savitzky Golay and no deisotoping) using the ProteinLynx Global Server 2.05 software (Waters), and peak lists were exported in the micromass pkl format. Peak lists of precursor and fragment ions were matched automatically to proteins in the NCBI nonredundant database March 16, 2007: 4,761,919 sequences, 1,643,098,755 residues) and the GenBank Viridiplantae other than Arabidopsis and Oryza sativa EST database (April 29, 2006: 140,695,050 sequences, 26,889,187,900 residues) using the MASCOT version 2.2 program (Matrix Science, London, UK). We first performed a search in the NCBInr database and then in the GenBank EST database, where we applied a restriction to Viridiplantae other than Arabidopsis and Oryza sativa because of sequence redundancy between the two databases. The MASCOT search parameters were trypsin specificity, one missed cleavage, carbamidomethyl Cys and oxidation of Met, and 0.2 Da mass tolerance on both precursor and fragment ions. All proteins identified have a MASCOT score above the significance level corresponding to p < 0.05. To validate protein identification based on multiple peptides, only matches with individual ion scores above 20 were considered. In most cases, at least two different nonoverlapping peptide sequences of more than 6 amino acids with a mass tolerance <0.05 Da were obtained. Moreover, among the positive matches based on one unique peptide, only spectra containing a series of at least 5 consecutive y or b ions with individual ions scores above a threshold value calculated by the MASCOT algorithm with our search parameters were accepted (identity threshold of 41 for the NCBInr database and of 60 for the GenBank EST database for Viridiplantae other than Arabidopsis and Oryza sativa). These validation criteria are a good compromise to limit the number of false positive matches without missing real proteins of interest. When the same set of peptides matched different EST accessions, we retained the M. truncatula ESTs and checked that they belong to the same contig from the TIGR MtGI. Of the 137 spots analyzed, 123 were unambiguously identified. The protein sequences were subjected manually to BLAST searches against the MtGI database to identify the corresponding TC sequence. The peptide sequences from nano-LC-MS/MS with accession number, description, protein and peptide MASCOT scores, MS/MS fragmentation for unique peptide-based identifications, sequence coverage, and BLAST probability scores are provided in Supplemental Table 3.
The protein spots that did not fit these validation criteria (14 of 137) were reanalyzed by HPLC coupled with a LCQ Deca XP+ ion trap (Thermo Electron, San Jose, CA) using a nano electrospray interface according to Méchin et al. (17). Ionization (1.3–1.5 kV ionization potential) was performed with liquid junction and a noncoated capillary probe (10-mm inner diameter; New Objective). Peptide ions were analyzed using Xcalibur 1.4 (Thermo Electron) with the following data-dependent acquisition steps: 1) full MS scan (m/z ratio 400–1900, centroid mode), 2) ZoomScan on a selected precursor (scan at high resolution in proûle mode on a m/z window of 4), and 3) MS/MS (stability parameter = 0.22, 50 ms activation time, 40% collision energy, centroid mode). Steps 2 and 3 were repeated for the two major ions detected in step 1. Dynamic exclusion was set to 30 s. A search in the MtGI database (226,923 ESTs clustered, on 36,878 TCs, from 55 cDNA libraries, last release: 8.0 January 19, 2005) was performed with Bioworks 3.1 (Thermo Electron). Trypsin digestion, Cys carboxyamidomethylation, and Met oxidation were set to enzymatic cleavage, static, and possible modifications, respectively. Precursor mass and fragment mass tolerance were 1.4 Da and 1 Da, respectively. Identified tryptic peptides were filtered according to their cross-correlation score (Xcorr) higher than 1.7, 2.2, and 3.3 for mono-, di-, and tri-charged peptides, respectively. A minimum of two different peptides was required. In the case of identification with two or three MS/MS spectra, similarity between the experimental and the theoretical MS/MS spectra was visually confirmed. The peptide sequences with accession number, cross-correlation scores,
Cn, and percent sequence coverage are provided in Supplemental Table 3.
The identification of M. truncatula protein sequences allowed us to search for the corresponding genes on the Mt16kOLI1. To compare protein and transcript patterns, protein accumulation values were expressed as log2 ratios of spot abundance in the seed samples relative to that in the reference sample (12 dap for the time course and embryo for the tissues). Protein patterns and the corresponding transcript profiles were hierarchically clustered using the Cluster 3.0 software (39) with the commonly used average linkage method and were visualized using Java TreeView (1.0.5), except for the genes displaying no significant regulation during the time course (–1 < M < 1), which were independently subjected to a hierarchical clustering. Pearson correlation coefficients (r) were used to evaluate the levels of correlation between transcript and protein profiles during the time course of seed development and in seed tissues (Statistica software, version 7.0, StatSoft). An r value of 1.0 indicates perfect correlation, whereas a value of 0 indicates no correlation, and r = –1 indicates perfect anti-correlation. A p value <0.05 was considered to be statistically significant. The transcriptome/proteome comparisons made, including the Pearson correlation coefficients, are available in Supplemental Table 4.
| RESULTS |
|---|
|
|
|---|
Of the 790 polypeptides profiled during seed development, 615 varied significantly at characteristic stages as compared with the 12 dap stage. These variations are accompanied by significant changes in the transcriptome because about 50% of the transcripts (
4800 mRNAs) detected in developing seeds varied in abundance. A complete list of all genes significantly regulated (
2-fold, p < 0.05) at either the transcript or protein level, having been reannotated and classified into functional classes as defined by the MapMan ontology (38), is provided in Supplemental Tables 1 and 3. As expected, the number of differentially accumulated proteins and transcripts increased from 14 to 24 dap, corresponding to the seed filling phase. Strikingly, a 2-fold increase in the total number of up-regulated transcripts was observed between 24 dap and physiological seed maturity (36 dap), whereas the number of up-regulated proteins detected did not increase in the same period. Interestingly, the most marked changes are in genes related to transcription and RNA processing (class #27, Supplemental Fig. 2 and Supplemental Table 1). These transcripts presumably contribute to the stored mRNA pool used for protein synthesis during germination (49). They represent potential indicators of germination performance.
Comparison of Protein Profiles with Transcript Patterns during Seed Development—
The proteome and transcriptome profiles were compared to investigate at which level protein accumulation is regulated, i.e. if regulation takes place at the level of the transcript or rather at the level of the protein itself (protein degradation and other post-translational modifications). A total of 208 spots matched to TCs corresponding to oligonucleotide probes on the Mt16kOLI1 microarray. For 156 of these, RNA and protein expression profiles throughout seed development were obtained. The protein and transcript profiles, with values expressed as mean of log2 ratios using the 12 dap sample as reference, were hierarchically clustered. Nine distinct groups of profiles were identified that reflect different modes of regulation (Fig. 2).
|
Group 3 is composed of 33 protein spots whose abundance decreases from the early stages of seed filling onwards. A highly significant correlation (r > 0.9, p < 0.05) between transcript and protein levels was found for 13 genes in this group, and for 14 further genes, the r value exceeded 0.5 (Supplemental Table 4). Many of them are related to cytoskeletal organization in the cell (tubulin and actin), redox regulation, abiotic stresses, glycolysis, and Met metabolism. Among enzymes of Met metabolism identified were a Met synthase and two isoforms of S-adenosylmethionine (AdoMet) synthetase, previously shown to be associated with the status of metabolic activity in seeds (9, 51, 52). Their decreased levels would be consistent with the switch from active metabolism to a quiescent state. Finally, group 4 includes proteins whose accumulation did not vary significantly during seed development (–1 < M < 1, i.e. less than 2-fold regulation), suggesting a continuing function throughout this process. Many are molecular chaperones involved in protein folding either in the plastid (ribulose bisphosphate carboxylase-oxygenase (RuBisCO) subunit-binding proteins) or in the mitochondrion (heat shock protein family members), and several are required for protein synthesis (RNA helicase, elongation factor 1B
-subunit).
Groups 5 and 6, Sequences Displaying Apparent Preferential Transcript Turnover—
Fifteen sequences of 36 have a correlation coefficient between transcript and protein patterns below 0.5 reflecting different degrees of stability (Supplemental Table 4). The levels of these proteins remain constant while transcript abundance decreases. They include those essential for mitochondrial and chloroplast function during seed development. Some are present throughout seed filling (group 5), such as proteins involved in photosynthesis (two oxygen evolving enhancer proteins and a chlorophyll a/b binding protein), a process shown to produce oxygen for respiration and reserve synthesis in developing seeds (5). Others are present up to desiccation (group 6), such as the mitochondrial processing peptidase ensuring the continued functioning of the mitochondrion.
Group 7, Sequences Displaying Apparent Preferential Protein Turnover—
The proteins whose abundance decreases while the transcript level remains constant are involved in amino acid metabolism (glutamine synthetase, Met synthase, and S-adenosylhomocysteine (AdoHcy) hydrolase) or related to stress, suggesting that the transcripts encoding these proteins may form part of a pool of stored mRNAs that could be reutilized during seed development (e.g. in response to a stress) or later during germination. Further studies are needed to characterize the occurrence of protein and/or transcript turnover by using radiolabeled precursors for their synthesis (49, 53).
Groups 8 and 9, Protein and Transcript Poorly Correlated—
Our study revealed several proteins displaying a profile differing from that of the corresponding transcripts during seed development (r < 0.5 for 21 sequences of 29). The majority of these proteins are involved in the regulation of protein synthesis (translation elongation factors) either in the plastid or in the cytosol, in oxidative stress or in detoxification (e.g. catalase and glutathione S-transferase). The underlying causes for the association of genes observed in groups 8 and 9 will be the subject of further investigation.
Protein and Transcript Distribution in Seed Coat, Endosperm, and Embryo—
We have examined the distribution of the proteins and transcripts in the three major seed tissues, seed coat, endosperm, and embryo, at the onset of seed filling (14 dap, Fig. 1A). Protein distribution in the three tissues was investigated by using as reference the high resolution composite two-dimensional map established from the time course data (Supplemental Fig. 1), and transcript distribution was studied by cohybridizing, on the Mt16kOLI1 microarrays, cDNAs derived from either the seed coat or the endosperm with embryo cDNAs. To allow a comparison between proteome and transcriptome data, protein spot abundance in the seed coat or endosperm was compared with that in the embryo reference tissue. A classification of the genes differentially expressed (at least 2-fold with a p value <0.05) at the transcript or protein level in embryo versus seed coat or endosperm is presented in Supplemental Fig. 3, and a complete list is provided in Supplemental Tables 1 and 3.
Despite the difficulty of collecting separated tissue components from seeds as small as those of M. truncatula at an early stage of development, we identified a large number of proteins and transcripts enriched in the separated tissues. The proteomes of the three tissues are highly contrasted, with 95% of the spots detected being differentially accumulated (
2-fold, p < 0.05) in embryo versus seed coat or endosperm (Fig. 3). The microarray data revealed 718, 1382, and 2029 genes predominantly expressed in endosperm, seed coat, and embryo, respectively (about one-third of all genes expressed in these tissues). The effectiveness of seed component separation was confirmed by the detection of marker gene products known to be associated with a particular seed tissue. For example, the seed coat peroxidase was not detected, at either transcript or protein level, in the endosperm and embryo extracts (spot 261 in Fig. 3A, 10-fold higher expressed in seed coat versus embryo), and storage proteins and their corresponding transcripts were preferentially accumulated in embryo cells (4-fold higher expressed in embryo, Fig. 3A). This study carried out at the seed tissue level yielded a number of novel findings, including the identification of endosperm-specific gene products, with few being reported to date in legumes (54).
|
|
-tonoplast intrinsic protein (
-TIP) aquaporins, Fig. 5D). The qRT-PCR data from developing seed and tissues collected on three series of plants (Supplemental Table 2) were consistent with those obtained using microarrays, thus confirming the robustness of our results, but the dynamic range of induction measured for a sequence was
3-fold higher with qRT-PCR, a common observation as mentioned previously (34). The distribution of these transcripts within the plant was examined in further qRT-PCR experiments (Figs. 4B and 5A–D). Eight genes of 11 were preferentially expressed in immature seeds compared with the other plant organs (e.g. protein transporter, sulfate transporter, and subtilisin), implying a specialized function in developing seeds. Five of these seed-specific genes were expressed in embryo cells during storage protein deposition. One of them has high sequence similarity (73%) to the Tim17:22 gene family of protein transporters found in the mitochondrial inner membrane (Fig. 5C). Knowing the central role of mitochondria in storage cells of developing seed (5), it would be of interest to investigate the specialized function of this putative translocase. The four other genes encode storage proteins and
-TIPs, aquaporins regulating the flux of water and solutes across the storage vacuole membranes (55). The two
-TIP genes were expressed concomitantly with legumins, suggesting a role in the deposition of storage proteins (Figs. 1B and 5D).
|
|
|
| DISCUSSION |
|---|
|
|
|---|
An original finding raised by our study is that amino acid metabolism genes are among the most highly regulated in the M. truncatula seed. By specifying the distribution of the proteins and transcripts in the seed coat, endosperm, and embryo at the switch toward storage functions, we observed a compartmentalization of Met metabolism between seed tissues that is of agronomic interest. These results are discussed hereafter in light of the proteome-transcriptome comparison. Here, we relied on complementary information derived from our transcriptome dataset, for example, 1) to specify whether differential expression between tissues is specific to Met biosynthetic enzymes or holds more generally for other amino acids, and 2) to identify candidate genes with possible roles in the transfer of nutrients/amino acids between the seed compartments.
The Significance of the Complex Compartmentalization of Amino Acid Metabolic Pathways between the Seed Tissues: Example of Met Metabolism—
The most novel findings of the study relate to the compartmentalization of metabolic pathways in component tissues of the seed, in particular those concerned with the biosynthesis of certain amino acids. A differential expression of enzymes involved in Met metabolism between seed tissues has been revealed, reflecting a partitioning of biosynthetic metabolic pathways. A Met synthase (spot 431), producing Met from homocysteine (Hcy), and its transcript (TC85287) were preferentially accumulated within the seed coat (Fig. 6). Protein and RNA abundance both decreased in linear correlation (r = 0.95, p = 0.01) as seed filling progressed (group 3 in Figs. 2 and 6). Another Met synthase detected at the protein level in the seed coat and endosperm (spot 9, Fig. 3) also decreased during this period. This implies a decrease in Met synthesis from Hcy in the tissues surrounding the embryo during seed filling. Other genes related to Met metabolism were also preferentially expressed within the seed coat, in particular a gene encoding Hcy S-methyltransferase that catalyzes the irreversible conversion of SMM to Met (Fig. 6). The transcript increases strikingly in relative abundance at 20 dap, parallel to the massive accumulation of the major storage proteins (Fig. 1B), and decreases subsequently as the embryo prepares for quiescence. The predominance of the Hcy S-methyltransferase transcript over those encoding Met synthase suggests that the production of free Met at this stage is from SMM rather than from the pathway involving Met synthase.
SMM, a transport and storage form of Met unique to plants (56), may be imported directly from the phloem, or it could be synthesized within the seed by a methyl transfer from AdoMet to Met via the action of AdoMet:Met S-methyltransferase (SMM cycle). The AdoMet precursor, needed for the SMM cycle to be active, is synthesized from Met by the action of AdoMet synthetase. The dramatic decrease during seed filling of the AdoMet synthetase transcripts TC89089 (endosperm-located) and TC85200 (endosperm- and seed coat-located) and of their corresponding proteins (group 3 in Figs. 2 and 6), together with a report of high concentrations of SMM in the phloem (6), suggest that seed SMM is mainly derived from the phloem rather than from the SMM cycle intraseed and that Met synthesis occurs within the seed coat from this pool of SMM, thus providing free Met for transfer to the embryo and incorporation into storage proteins. This may avoid depletion of the embryo Cys pool otherwise sourced for Met synthesis.
Met synthase and AdoMet synthetase (group 3, Fig. 2) are fundamental in controlling the transition from a quiescent to a highly active metabolic state during germination (51). The disappearance of these enzymes during seed development coincides with previous data obtained from the entire seed or isolated filial tissue (9–11, 17, 57). Their restriction in the seed coat and endosperm to the onset of seed filling reflects a metabolic shift in these tissues from a highly active to a quiescent state as the embryo accumulates the storage compounds. Such a metabolic shutdown has been inferred for other types of metabolism (3, 11). Further evidence for a reduction in metabolic activities is the sharp decrease of endosperm-located glycolytic enzymes (e.g. phosphoglycerate kinase, phosphoglyceromutase, and aldolase), and of their transcripts during seed filling (group 3, Fig. 2).
We further analyzed the proteins related to Met metabolism that were specifically accumulated in embryo cells. Among them was an AdoHcy hydrolase (Fig. 3A). This enzyme converts AdoHcy, resulting from AdoMet-dependent methylation of nucleic acids, proteins, lipids, and other metabolites to Hcy (58, 59). Since AdoHcy produced during the activated methyl cycle is a potent competitive inhibitor of methyltransferases that are essential for cell growth and development, AdoHcy hydrolase (i.e. depletion of the AdoHcy pool) may be crucial to ensure transmethylations in embryo cells during seed filling. In support of this hypothesis, the AdoHcy hydrolase level remained high up to 24 dap and then decreased as the seed entered the desiccation phase (Fig. 6). Furthermore, transcripts of several genes encoding AdoMet-dependent methyltransferases were detected in embryo cells (e.g. TC79186 and TC86500 in Fig. 6). Interestingly, in addition to the seed coat and endosperm-located Met synthase, another isoform of Met synthase (TC89672) was expressed with a similar profile to AdoHcy hydrolase (group 7, Fig. 2). Although the corresponding transcript was present in all three seed tissues, the protein was only detected in the endosperm and to a higher extent in the embryo where it may regenerate Met from Hcy produced in the course of the activated methyl cycle (Fig. 6). Because this Met synthase isoform is highly expressed up to 20 dap, it may contribute to Met synthesis up to this stage. Moreover, the observation that AdoHcy hydrolase and Met synthase transcripts persist at the end of seed development is consistent with the finding that both the de novo Met biosynthetic pathway and the activated methyl cycle operate very early during seed germination (59).
Based on these data, a model for Met synthesis in developing seeds is proposed (Fig. 6), in which Met is synthesized within the embryo from Hcy produced in the course of the activated methyl cycle up to mid-stages of seed filling (20 dap). To meet the high demand for protein synthesis during seed filling, Met is also synthesized in the seed coat from the pool of SMM and subsequently transported to the embryo. In support of this model, Hcy S-methyltransferase, which synthesizes Met from SMM, showed a higher activity than the SMM cycle enzyme AdoMet:Met S-methyltransferase in developing Arabidopsis seeds, and the incorporation of 35S into protein was the same whether [35S]Met or [35S]SMM was supplied (60).
Compartmentalization of Other Amino Acid Biosynthetic Pathways—
To investigate whether differential expression between tissues is specific to Met biosynthetic enzymes or holds more generally for other amino acids, we have further used information from the transcriptomics dataset (Fig. 7). Genes encoding asparagine synthase and L-asparaginase, involved in the interconversion of asparagine/aspartate, are mainly expressed in the seed coat and the endosperm. This suggests that these tissues are metabolizing asparagine, which is one of the principal amino acids delivered by the phloem, before translocation to the embryo (Fig. 6). These genes are also preferentially expressed in the pea seed coat (2). The seed coat is also the site of expression of the gene encoding glutamate decarboxylase (14 in Fig. 7; see also Fig. 6) that converts glutamate to
-aminobutyrate, which has been associated with various physiological responses, including regulation of carbon flux into the tricarboxylic acid cycle (61). In contrast, a number of genes are apparently expressed preferentially in embryo cells, such as those encoding enzymes of tryptophan, valine, isoleucine, and arginine biosynthesis. Significantly, all of the detected genes encoding enzymes of the Arg biosynthetic pathway were preferentially expressed in embryo cells. Recently, it has been shown that the key regulatory enzyme of Arg synthesis, N-acetyl glutamate kinase (25 in Fig. 7A), interacts with the PII protein to relieve inhibition of the kinase activity by Arg (62). Here, we show that the pII gene (TC88736) is highly expressed in embryo cells at 14 and 16 dap, concomitantly with storage protein synthesis (Fig. 7), and may therefore be required for a rate of Arg synthesis sufficient for incorporation into storage proteins, which contain
10% Arg residues.
Another finding was that the seed compartments accommodate two distinct pathways of sulfur assimilation. Transcripts for the first enzyme of sulfate reduction (ATP sulfurylase, 1 in Fig. 7) were detected in both the seed coat (TC76360) and the embryo (TC88355) at the onset of seed filling. This enzyme catalyzes the formation of 5'-adenylyl sulfate (APS) from SO42–, which can be utilized in two pathways. In the first pathway, APS is converted by the action of APS kinase (2) to 3'-phosphoadenylyl sulfate, a sulfur donor for the synthesis of defense-related secondary metabolites, including glucosinolates (63). In the second pathway, APS is reduced to provide sulfide ions for Cys synthesis, through two reactions catalyzed by adenosine 5'-phosphosulfate reductase (3) and sulfite reductase (4), respectively (Fig. 7B). The genes encoding enzymes involved in APS reduction and sulfide incorporation into Cys (Cys synthase, 5) are expressed in both the seed coat and the embryo, whereas the genes encoding APS kinase (2) and glutathione synthetase (6) are expressed specifically within the seed coat and the endosperm. These findings suggest that sulfate in the tissues surrounding the embryo is mainly incorporated into glutathione and defense-related secondary metabolites, whereas most of the sulfate entering the embryo is utilized for the synthesis of Cys. This compartmentalization, along with that observed for the Met biosynthetic pathways, may avoid a depletion of the embryo Cys pool in favor of its incorporation into proteins during seed filling (Fig. 6). The partitioning of these metabolic pathways provides new directions for genetic improvement of grain legumes, low in the essential sulfur-rich amino acids, to enhance value for human consumption and animal feed.
An Intraseed Nitrogen Remobilization May Contribute to Storage Protein Accumulation—
Among the metabolic events occurring at the switch to seed filling is protein turnover in the seed coat and endosperm, which releases amino acids for the developing embryo. It has been shown that whole seeds cultured in vitro in the absence of exogenous nitrogen are able to accumulate embryo storage proteins by recycling nitrogenous compounds from the embryo surrounding tissues (64), whereas isolated embryos are not. Several proteases, expressed preferentially either in the endosperm (aspartic protease 1, oligopeptidase A, and serine proteases) or in the seed coat (20S proteasome subunit, leucine aminopeptidase and serine protease) (Supplemental Table 3), are good candidates for involvement in this process of remobilization (Fig. 6). Three endosperm-associated subtilisin-like spots (75, 132, and 133, Figs. 3B and 4) matched one gene sequence accessible in the BAC clone mth2–12j18 (GenBank AC146561.9). The predicted molecular masses and pI values of the polypeptides correspond with co- and post-translational modifications of a prepro-enzyme (65). The transcript and protein spots peaked in the prestorage phase and were not detected in other tissues (Fig. 4). In contrast, a seed coat-associated subtilisin increased from the onset of seed filling (14 dap) and reached a maximum level at 20 dap when endosperm reserves were consumed (spot 131 in Fig. 3B, Fig. 4A, group 2 in Fig. 2). The subtilisins detected in the endosperm and in seed coat are encoded by different genes and display contrasted protein patterns during seed development, suggesting that they may play distinct roles in each tissue. Moreover, the linear correlation between transcript and protein levels (r = 0.9, p < 0.05, Supplemental Table 3) suggests that the amount of subtilisin-like serine proteases in developing seeds is controlled by transcript abundance. The profile of expression of the seed coat-associated subtilisin is identical to that of a cell wall-related pectin acetylesterase expressed in the same tissue (spot 593, group 2 in Fig. 2). As there are reports of matrix-associated plant subtilisins (66–68), a role in cell wall modification is possible.
Transporters Implied in Nutrient Import and Intraseed Translocations—
The exchange of metabolites between seed tissue compartments implies a need for the corresponding transporters. The transcriptome study revealed more than 90 genes encoding transporters potentially involved in the uptake of nutrients or in their translocation within the seed (Fig. 5). Many of them were preferentially expressed in the embryo surrounding tissues and are thus probably involved in the uptake of nutrients from the phloem or in the efflux of endogenous nutrients en route to the embryo. For example, among the 17 identified genes encoding amino acid transporters, 10 were preferentially expressed in seed coat and/or endosperm, such as the seed coat-derived amino acid transporter TC-ID #78342. One transcript (TC90046) encodes a mitochondrial carrier protein with high sequence similarity (72%) to the Arabidopsis transporter SAMC1, proposed to catalyze the uptake of AdoMet in exchange for AdoHcy to regulate methyltransferase activities in mitochondria and plastids (69). This gene was preferentially expressed in seed coat and endosperm and reaches a maximum at 20 dap, presumably to transport the remaining pool of AdoMet needed for methyltransferase activities (Fig. 6). In addition, we identified five sulfate transporters preferentially expressed in the embryo (TC86732, TC83848, and TC92748) or in the seed coat (TC93814 and TC78211). TC-ID #78211 is preferentially expressed in developing seeds compared with other plant organs and at much higher levels during the prestorage phase (Fig. 5B). The protein encoded shows 78% sequence similarity to the Arabidopsis transporter SULTR3;5, which facilitates sulfate transport in the vascular system (70). The relative contribution of sulfate and SMM import to seed sulfur metabolism merits further investigation.
Conclusions—
In addition to documenting the relationship between mRNA and protein patterns from early stages of seed filling to desiccation, this study has revealed a specialization of the filial and maternal tissues, in particular regarding Met and sulfur metabolism. This specialization may be not only controlled by the tissue-specific genetic programs of either maternal (seed coat) or zygotic (embryo), but also by the energy and oxygen status of the different tissues. Because of their location, the embryos grow in an environment of low light and oxygen availability (71), which may affect ATP production and biosynthetic activities. As a way to control biosynthetic fluxes, embryos become photosynthetically active during maturation (5, 12). In developing seeds of soybean, photosynthesis is followed by an increase in ATP levels more prominently within the inner regions of the embryo where biosynthetic activities occur. Several proteins from both photosynthetic light reactions (oxygen evolving enhancer and chlorophyll a/b-binding proteins) and dark reactions (RuBisCO and RuBisCO subunit binding proteins) were detected in the M. truncatula developing embryo (