|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 1:561-566, 2002.
© 2002 by The American Society for Biochemistry and Molecular Biology, Inc.
,
,¶
,
,||
,

,
,**
Division of Genome Biology, Cancer Research Institute, Kanazawa University, Kanazawa 920-0934, Japan
Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Corporation (JST), Tokyo 102-0081, Japan
** INTEC Web and Genome Informatics Corporation, Tokyo 136-0075, Japan
| ABSTRACT |
|---|
| WHY PROTEIN INTERACTOME? |
|---|
| THE PRINCIPLE OF THE YEAST TWO-HYBRID SYSTEM |
|---|
The Y2H system enables highly sensitive detection of protein-protein interactions in vivo without handling any protein molecules. It also allows one to screen a library of activation domain fusions or preys for the binding partners of ones favorite protein expressed as a DNA binding domain fusion or bait, and it can be used to pinpoint protein regions mediating the interactions.
On the other hand, the Y2H system has limitations. First, in principle, it cannot detect interactions requiring three or more proteins and those depending on posttranslational modifications. However, note that, when applied to the budding yeast itself, it can occasionally detect interactions involving three proteins or posttranslational modification by the aid of endogenous third proteins or modifying enzymes. Second, the Y2H system is not suitable for the detection of interactions involving membrane proteins, although a substantial number of such interactions have been detected via an unexplained mechanism. Finally, the Y2H interaction does not guarantee that the inferred interactions are of physiological relevance. Despite these and other limitations, the power of the Y2H system is so tremendous that it is now established as a standard technique in molecular biology.
The Y2H system has been successfully used to examine an interaction between the two proteins of ones interest and also to screen for unknown binding partners of ones favorite protein. It can be, in principle, used in a more comprehensive fashion to examine all possible binary combinations between the proteins encoded by any single genome. Three groups (ours, CuraGens, and Fieldss) launched such ambitious projects using the budding yeast as the target.
| GENOME-WIDE Y2H ANALYSIS OF THE BUDDING YEAST |
|---|
, respectively. Bearing opposite mating types, bait clone and prey clone can mate to form diploid cells. Consequently each diploid cell has a unique combination of bait and prey. If they interact, the reporter genes are activated to allow the cells to survive the selection. In other words, each survivor should bear a pair of mutually interacting bait and prey, which can be revealed by tag sequencing of the cohabiting plasmids to generate an interaction sequence tag (IST). These ISTs can then be used for the data base search to decode the inferred protein-protein interactions.
We prepared pools for screening, each containing 96 bait or prey clones, performed the mating-based screening described above in all possible combinations between the pools, and finally revealed 4,549 independent two-hybrid interactions (3). Of these, 841 were detected more than three times and were assumed to be of high relevance. Hence we call these interactions as our "core" data. Notably more than 80% of these interactions were the ones never described before. A similar IST project was conducted by CuraGen (4), who screened a pool of
6,000 preys with each unique bait. They revealed 691 interactions in total, most of which were also novel.
Comparison between the two data sets revealed an unexpectedly small overlap: they share 141 interactions, which correspond to
10% of the total independent interactions (Fig. 1) (3). There would be a number of plausible reasons for the small overlap. The systems used by the two groups were different: we used multicopy vectors in the host bearing multiple reporter genes, whereas they used single-copy vectors but used only a single reporter gene. Since both groups PCR-amplified the ORFs, some would inevitably bear mutations that affect interactions. Although both groups pooled clones, the screen does not seem saturated: two-thirds of our 4,549 interactions (3), and one-third of CuraGens 691 interactions were identified only once (4). Of course, any two-hybrid screen contains false signals (see below). These and other unidentified factors are assumed to contribute to the small overlap observed between the two IST projects.
|
6,000 prey clones was mated with each unique bait strain, and the diploid cells formed were replica-plated onto the selection medium to decode interactions from the coordinates of the survivors. This approach is rather slow and tedious but is highly sensitive and free from the problem of unsaturated screening. They examined 142 baits to reveal 281 interactions, which again failed to largely overlap with those by IST approaches. | FALSE POSITIVES |
|---|
What fraction of the genome-wide Y2H data is biologically relevant? To estimate the reliability of our data, we inspected a subset of our core data composed of 415 interactions because these interactions occur between two known proteins and hence can be, more or less, evaluated for their biological relevance. This analysis indicated that
50% of the interactions can be assumed to be biologically relevant (3).
More recently an interesting method was developed to evaluate the validity of interaction data based on the similarity of the gene expression profile between the genes for the bait and prey displaying a two-hybrid interaction (5). The analysis of our data by this method indicates that interactions with more than three IST hits, or our core data, are expected to be
60% reliable (5).
While these two independent estimates may illustrate the overall quality of genome-wide two-hybrid data, users of these data still have to evaluate each interaction of their interest. Even in our non-core data, one does find a number of intriguing interactions. On the other hand, those with high IST hits may well contain a substantial number of biologically meaningless interactions. Therefore, bioinformatics tools to assist such evaluation are critical to fully exploit these genome-wide data (see below).
| FALSE NEGATIVES |
|---|
We further analyzed their data by examining the origin of each interaction examined in their study. The analysis revealed that 9 of the 19 interactions reproduced were originally detected by at least two of the three groups, whereas more than 90% of the interactions that they failed to recapitulate were those detected only by a single group. Although such irreproducible interactions may well be technical false positives discussed above, some interactions seem to be sensitive to subtle difference in the constructs and Y2H system used, whereas others are largely insensitive and easily reproduced by anyone. Such a tendency may become more prominent when using full-length ORFs in the Y2H system because it is known that full-length proteins often show much weaker signals than the appropriately trimmed protein regions containing the interaction domains. These features are inherent to the Y2H system and seem to have contributed to the small overlap observed between the different genome-wide screen data.
| Y2H AND OTHER INTERACTOME DATA |
|---|
For instance, Gavin et al. (7) and Ho et al. (8) purified 589 and 493 complexes, respectively, of which 93 were purified by both groups using the same proteins as baits. Comparison of MS analysis on these 93 complexes between the two groups revealed that 48 complexes (52%) contain at least one protein detected by both groups, whereas the other 45 (48%) failed to share any. With respect to the entire proteins detected in these complexes, Gavin et al. (7) and Ho et al. (8) revealed 577 and 877, respectively. The overlap between these proteins was only 133, thereby comprising
10% of the 1,321 proteins collectively reported by the two groups (Fig. 1). Even in the 48 complexes described above, the proteins detected in both studies comprise 14% of the total.
Thus the rate of overlap is similar to the one observed in the two-hybrid projects. Although the strategies of the two groups are different and the comparison at the level of protein nexuses revealed by several different baits improves the overlap, it should be noted that even these proteomic studies contain substantial false signals.
It is also interesting to note that the interactions revealed by these approaches are somewhat complementary to those by the Y2H system. The Y2H projects essentially detect binary interactions including those of rather weak or transient nature. On the other hand, the MS studies reveal more complex interactions, which are inevitably biased toward those with high abundance and stability (9). Novel analytical platforms are thus required for the detection of weak or fast interactions by means of MS. One of the promising approaches would be an integration of MS with biomolecular interaction analysis based on the principle of surface plasmon resonance (10).
Intriguing features of these data sets are also revealed by the integration of gene expression data (11). The data set by Gavin et al. (7) based on genomic integration of tandem affinity purification tags displays strong co-expression among the genes encoding the identified proteins. In contrast, those by Ho et al. (8) based on episomal overexpression of epitope-tagged proteins shows rather weak co-expression patterns similar to those of Y2H projects. Recent analysis of accumulated protein interaction data provides further detail on the various aspects of both Y2H and MS data sets (9).
| HUGE PROTEIN INTERACTION NETWORK |
|---|
4,000 proteins (3, 12, 13). The additional interaction data by co-precipitation/MS studies would further expand the largest network. The entire network is obviously too complex for the human brain to understand. We need a method to extract biologically meaningful clusters or subnetworks from the huge nexus to formulate a novel hypothesis for further experimentation. However, one should note that the network has become too complex due to the lack of spatial and temporal resolution. For instance, while RNA polymerase I, II, and III are distinct entities, they would be linked into a single huge nexus in silico because of common subunits shared by the three. Thus, we have to integrate existing knowledge on yeast proteins with the massive interactome data. In addition, we should evaluate the relevance of each interaction provided by any large-scale projects. Ideally each edge of the complex graph should be "weighed" to help the evaluation of each interaction. As discussed above, the number of IST hits may serve as a good measure for the reliability of Y2H data (3, 5). The independent lines of evidence for the interaction, such as coincidence between Y2H and MS data, presence of genetic interaction, similar mutant phenotype, shared subcellular localization, and co-expression of the genes, would be more important.
Even provided with these valuable data, construction of the protein interaction network model is still a tedious task that requires many trials and errors. We thus developed a bioinformatics tool to visualize and help one estimate the structure of networks by referring to existing knowledge and other data (Fig. 2) (3, 14). Such tools would become critical to fully exploit interactome data as well as a plethora of other functional genomic data.
|
| TOP-DOWN APPROACH TO INTERPRET HUGE INTERACTION NETWORK |
|---|
Intriguingly, a highly biased distribution of connectivity has proved to ensure a robustness of the network against random perturbations. On the other hand, such networks are extremely vulnerable to targeted attacks to the highly connected nodes or hubs. Consistent with this, such hubs of the huge protein interaction networks tend to be the products of essential genes (16). Thus, the protein interactome seems to share a basic design principle with other complex networks. It is also conceivable that highly connected nodes lacking apparent homologs in mammals may serve as good targets for antifungal drug development.
The complex network has a heterogeneous organization with local clustering or "communities," which may represent functional modules to be identified in the case of protein interactome. Recently an intriguing approach was developed to dissect complex network into communities based solely on the topology of the network and was successfully applied to biological and social networks (17). This approach calculates the shortest paths between every combination of the nodes in the complex graph and identifies the most frequently used one. The edge with the highest usage is just like the main traffic connecting one town to the other, and cutting the network at the edge can properly split the networks into two communities. This method can be applied to the dissection of protein interaction networks. Indeed an application of a similar idea to the complex protein network, which was constructed by the conserved co-occurrence of genes in operons, was reported to successfully split a single huge network into clusters composed of functionally more homogeneous proteins (18).
| FROM CATALOGING TO FUNCTIONAL INTERPRETATION |
|---|
One key would be "profiling" of the interaction. It is quite informative to learn when and where an interaction occurs. Combinatorial use of recent proteomic techniques for protein complex purification and expression profiling will play a major role for this purpose (1921). The set of yeast strains bearing tandem affinity purification-tagged ORFs would serve as a valuable resource to perform these analyses.
Although the interaction profiling is of particular importance, it provides us with nothing more than an indirect hint for the role of an interaction. To unequivocally uncover it, one has to examine what happens if the interaction is specifically disrupted.
To perform such an "interaction targeting," one has to know the protein regions that mediate the interaction. Once such interaction domains are pinpointed, they can be overexpressed as "dominant negatives" to disrupt the cognate interaction between the endogenous proteins. Furthermore, they can be used for the isolation of interaction-defective mutants. The most versatile method for the mapping of interaction domains is obviously the Y2H system. Notably it can also be used as so-called "reverse" two-hybrid selection to select against interaction (22, 23). Using the reverse Y2H system, one can select interaction-defective alleles from a randomly mutagenized population. Once identified, responsible mutations can be easily introduced into the genome using a standard technique of yeast molecular genetics, and phenotypes of such interaction-defective mutants are expected to tell the biology of the interaction.
A pitfall of the reverse Y2H system is the lack of discrimination between missense and nonsense mutations, the latter of which abolish all of the functions born by the regions C-terminal to the mutations and hence have to be avoided in interaction targeting. Similarly, missense mutations destabilizing the protein should be eliminated. To achieve this goal, we applied the dual bait Y2H method (24, 25) to guarantee that the introduced mutations induce neither truncation nor destabilization of the protein to be analyzed (26). This "guaranteed" reverse Y2H system is ideal for the identification of missense mutations suitable for interaction targeting to clarify the biological role for the interaction per se.
The applications described above exemplify the potential of the Y2H system as a tool for functional characterization of protein-protein interactions. It should be noted that even the interactions originally detected by other means can be similarly analyzed if they are successfully recapitulated by the Y2H system. However, the current Y2H system is prone to show false negative signals (see above). Hence the development of the Y2H system with low false negatives would be of particular significance in accelerating the functional analysis of cataloged protein-protein interactions to proceed into the next stage of protein interactome analysis.
| CONCLUSION |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, September 16, 2002, DOI 10.1074/mcp.R200005-MCP200
1 The abbreviations used are: Y2H, yeast two-hybrid; ORF, open reading frame; IST, interaction sequence tag; MS, mass spectrometric or mass spectrometry. ![]()
* This work was supported in part by research grants from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT), the Japan Society for the Promotion of Science (JSPS), and the New Energy and Industrial Technology Development Organization (NEDO). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
|| Recipient of the postdoctoral fellowship from JSPS. ![]()
¶ To whom correspondence should be addressed: Division of Genome Biology, Cancer Research Inst., Kanazawa University, 13-1 Takaramachi, Kanazawa 920-0934, Japan. Tel.: 81-76-265-2726; Fax: 81-76-234-4508; E-mail: titolab{at}kenroku.kanazawa-u.ac.jp
| REFERENCES |
|---|
kinase GCN2.
J. Biol. Chem.
276, 17591
17596This article has been cited by other articles:
![]() |
M. Arifuzzaman, M. Maeda, A. Itoh, K. Nishikata, C. Takita, R. Saito, T. Ara, K. Nakahigashi, H.-C. Huang, A. Hirai, et al. Large-scale identification of protein-protein interaction of Escherichia coli K-12 Genome Res., May 1, 2006; 16(5): 686 - 691. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. X. C. N. Valente and M. E. Cusick Yeast Protein Interactome topology provides framework for coordinated-functionality. Nucleic Acids Res., January 1, 2006; 34(9): 2812 - 2819. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Vignols, C. Brehelin, Y. Surdin-Kerjan, D. Thomas, and Y. Meyer A yeast two-hybrid knockout strain to explore thioredoxin-interacting proteins in vivo PNAS, November 15, 2005; 102(46): 16729 - 16734. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Cusick, N. Klitgord, M. Vidal, and D. E. Hill Interactome: gateway into systems biology Hum. Mol. Genet., October 15, 2005; 14(suppl_2): R171 - R181. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Clarke, Pár. Cuív, and M. O'Connell Novel mobilizable prokaryotic two-hybrid system vectors for high-throughput protein interaction mapping in Escherichia coli by bacterial conjugation Nucleic Acids Res., February 1, 2005; 33(2): e18 - e18. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Stephens, J. Y. Chen, M. G. Davidson, S. Thomas, and B. M. Trute Oracle Database 10g: a platform for BLAST search and Regular Expression pattern matching in life sciences Nucleic Acids Res., January 1, 2005; 33(suppl_1): D675 - D679. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Terradot, N. Durnell, M. Li, M. Li, J. Ory, A. Labigne, P. Legrain, F. Colland, and G. Waksman Biochemical Characterization of Protein Complexes from the Helicobacter pylori Protein Interaction Map: Strategies for Complex Formation and Evidence for Novel Interactions Within Type IV Secretion Systems Mol. Cell. Proteomics, August 1, 2004; 3(8): 809 - 819. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Roguev, A. Shevchenko, D. Schaft, H. Thomas, A. F. Stewart, and A. Shevchenko A Comparative Analysis of an Orthologous Proteomic Environment in the Yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe Mol. Cell. Proteomics, February 1, 2004; 3(2): 125 - 132. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||