Identification of the Substrates and Interaction Proteins of Aurora Kinases from a Protein-Protein Interaction Model * S

The increasing use of high-throughput and large-scale bioinformatics-based studies has generated a massive amount of data stored in a number of different databases. The major need now is to explore this disparate data to find biologically relevant interactions and pathways. Thus, in the post-genomic era, there is clearly a need for the development of algorithms that can accurately predict novel protein-protein interaction networks in silico. The evolutionarily conserved Aurora family kinases have been chosen as a model for the development of a method to identify novel biological networks by a comparison of human and various model organisms. Our search methodology was designed to predict and prioritize molecular targets for Aurora family kinases, so that only the most promising are subjected to empirical testing. Four potential Aurora substrates and/or interacting proteins, TACC3, survivin, Hec1, and hsNuf2, were identified and empirically validated. Together, these results justify the timely implementation of in silico biology in routine wet-lab studies and have also allowed the application of a new approach to the elucidation of protein function in the post-genomic era.

The increasing use of high-throughput and large-scale bioinformatics-based studies has generated a massive amount of data stored in a number of different databases. The major need now is to explore this disparate data to find biologically relevant interactions and pathways. Thus, in the post-genomic era, there is clearly a need for the development of algorithms that can accurately predict novel protein-protein interaction networks in silico. The evolutionarily conserved Aurora family kinases have been chosen as a model for the development of a method to identify novel biological networks by a comparison of human and various model organisms. Our search methodology was designed to predict and prioritize molecular targets for Aurora family kinases, so that only the most promising are subjected to empirical testing. Four potential Aurora substrates and/or interacting proteins, TACC3, survivin, Hec1, and hsNuf2, were identified and empirically validated. Together, these results justify the timely implementation of in silico biology in routine wet-lab studies and have also allowed the application of a new approach to the elucidation of protein function in the postgenomic era. Molecular & Cellular Proteomics 3:93-104, 2004.
One possible path toward understanding the biological function of a target gene is through the discovery of how it interfaces with known protein-protein interaction networks. We are only now beginning to appreciate the nature and complexity of these networks, and construction of such a network using the traditional biochemical approaches still remains a significant challenge. Recently, the application of high-throughput technologies, such as large-scale yeast twohybrid analysis, has generated an enormous amount of data (1)(2)(3)(4). This has led researchers to often face the dilemma of how to effectively utilize the vast information gathered through these large-scale studies. Investigators relying solely on a traditional wet-lab approach for making decisions or setting research priorities are likely to find themselves outpaced by peers who combine in silico biology with empirical methods. Thus, there is clearly a need to develop a systematic and stepwise approach that can predict or prioritize potential targets in silico to aid in a greater understanding of how complex biological systems work. Given the advantages provided by an in silico approach, it seems reasonable to propose that it will become an essential tool for initially evaluating novel hypotheses and will offer an improved rationale for target prioritization, which will in theory result in only the most promising targets needing to be subjected to empirical testing. The goal of this study, therefore, was to create a virtual proteinprotein interaction model using the concepts that proteinprotein interactions require precise spatial proximity (compartmentalization) and temporal synchronicity (cell-cycle stage).
Given the availability of information for different model organisms, the evolutionarily conserved family of Aurora family kinases was selected as a model to identify novel biological networks from yeast to humans. Aurora, a family of mitotic serine/threonine kinases, has been conserved throughout evolution, as reflected by the presence of their homologues in a variety of model organisms, including Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila, Xenopus, mouse, and human. The Aurora family consists of one member in S. cerevisiae (Ipl1); two in C. elegans (AIR-1 and AIR-2); two in Drosophila (Aurora-A and IAL), and three members in humans (Aurora-A, Aurora-B, and Aurora-C; reviewed in Refs. 5-7). In yeast, Ipl1 is normally localized to the spindle pole body (8), and, during mitosis, it is mainly associated with the kinetochore and the mitotic spindles (9). In C. elegans, AIR-1 is localized at the centrosomes (10), while AIR-2 is localized at the kinetochore and midbody (11). In Drosophila, only the centrosome-associated Aurora-A has been thoroughly studied (12). In humans, Aurora-A is localized to the centrosome in prophase, subsequently spreading to the mitotic spindles/ centrosomes, where it remains until the end of mitosis (13). In contrast, Aurora-B is found at the kinetochore, mainly at the midzone, during anaphase. Aurora-C is localized to the centrosomes from anaphase to cytokinesis (14). Taken together, this information indicates that all Aurora variants from different species are localized to the mitotic apparatus.
Several potential substrates and interacting proteins of the Aurora variants have been identified for the different model organisms. In yeast, it has been demonstrated that Ndc10 (15), Sli15, Dam1 (kinetochore proteins) (16), Cin8 (a kinesin protein) (9), and Histone H3 (17) serve as substrates of Ipl1, thus implying a regulatory role for Ipl1 in kinetochore-microtubule attachment. Sli15 physically interacts with Ipl1 via an INCENP box in its C-terminal region (18,19). It has been suggested that this complex possibly regulates bi-orientation of chromosomes during mitosis. In C. elegans, AIR-1 is required for centrosome maturation and the proper localization of centrosomal proteins, such as CeGrip and ZYG-9 (20). In Drosophila, Aurora-A is able to phosphorylate and interact with the centrosomal protein D-TACC (21). In the frog, the Aurora-A homologue Eg2 (22) can phosphorylate a motor protein, Eg5 (23). Survivin, which has been implicated in both the control of cell division and the inhibition of apoptosis, associates biochemically with Aurora-A (24) and Aurora-B (24,25). Thus, based upon these known interactions, the Aurora family members are capable of executing a wide range of biological functions in cells, and the in silico quest for potential substrates and Aurora's protein interaction networks may represent the first concerted step toward understanding the molecular basis of the regulation of Aurora family kinases.
The first stage of this study involved assessment of computational tools for the access of public bioinformatics resources and the translation of raw large-scale high-throughput-based information to create the possible protein-protein interaction network for the Aurora family kinases. This involved a stepwise combination of extensive literature searches, database analyses of various yeast protein-protein interactions, subcellular localization, homology searches, and analysis of expression databases, resulting in the identification of four potential substrates and/or interacting proteins for the human Aurora variants. Second, these prioritized targets were subjected to traditional empirical wet-lab approaches to validate the predicted biochemical interactions, resulting not only in the confirmation of the findings reported earlier (21,24), but also the elucidation of novel, previously unrecorded biochemical interactions.
In summary, building such a protein-protein interaction search platform not only makes accessible a new approach to discovery of Aurora function, but also highlights the potential extrapolations of similar analysis for other as yet poorly characterized proteins in the post-genomic era.

EXPERIMENTAL PROCEDURES
Construction of the Protein-Protein Interaction Network-Identification of the protein-protein interactions used the keyword search function provided at dip.doe-mbi.ucla.edu, with the interactions further classified into two categories defined by a requirement for smallscale experiments (orange line) or high-throughput analogs (blue dashed line) as illustrated in Fig. 2. Proteins without assigned names or without characterized functions as listed at genome-www.stanford.edu/Saccharomyces/ were excluded using this search. Protein annotation is also provided at the above web site and also in Gene Ontology annotation (www.geneontology.org), permitting selection of proteins that fit the inclusion criteria (keywords: spindle pole and kinetochore). The following web site, www.proteome.com/databases/ YPD/, provides an excellent search engine; however, this web site is not publicly accessible.
Finding Homologues for the Different Model Organisms-The proteins in the map (see Fig. 2A) were further analyzed using the National Center for Biotechnology Information BLAST software (version 2.0) and the BLAST search provided at www.wormbase.org or flybase-.bio.indiana.edu. The selection of some homologues was based on a search of the literature. Protein sequences were all translated from their sequences in the GenBank database. Homologues were aligned using the CLUSTAL W program.
Cell Culture, Transfection, Co-immunoprecipitation, and Kinase Assay-Human 293T cells from the American Type Culture Collection (Manassas, VA) were maintained at 37°C in a 5% CO 2 incubator, and grown in Dulbecco's modified Eagle's medium supplemented with 10% heat-inactivated fetal bovine serum and 100 g/ml penicillin/ streptomycin (Life Technologies, Grand Island, NY). Ectopic expression of Flag-tagged Aurora-A, -B, and -C, hemagglutinin (HA) 1 -tagged Hec1 and hsNuf2, green fluorescence protein (GFP)-tagged TACC1, TACC2, and TACC3 in 293T cells was performed with LipofectAMI-NE TM , according to the manufacturer's instructions. Forty-eight hours after transfection, cells were lysed with RIPA buffer (150 mM NaCl, 0.1% SDS, 0.5% sodium deoxycholate, 1% Nonidet P-40, 50 mM Tris, pH 8.0). Equal amounts of total lysates (500 g) were immunoprecipitated with 2 g of anti-Flag (Sigma, St. Louis, MO), anti-HA (3F10; Roche Molecular Biochemicals, Indianapolis, IN) or anti-GFP (Roche Molecular Biochemicals) monoclonal antibody at 4°C for 2-4 h. Protein A agarose beads (Upstate Biotechnology, Lake Placid, NY) were added and incubated for another 2 h at 4°C. Beads were washed three times with buffer containing 10 mM HEPES, pH 7.0, 2 mM MgCl 2 , 50 mM NaCl, 5 mM EGTA, 0.1% Triton X-100, and 60 mM 2-glycerophosphate and twice with TBS buffer (38.5 mM Tris, pH 7.4, and 150 mM NaCl), respectively. Immune complexes were either subjected to SDS-PAGE or resuspended in kinase reaction buffer (25 mM Tris, pH 7.4, 10 mM MgCl 2 , 10% glycerol, 1 mM dithiothreitol, 1 mM Na 3 VO 4 , 1 mM NaF, 20 M ATP, and 10 Ci [␥-32 P]ATP) with 1 g of purified recombinant glutathione S-transferase (GST)-Aurora-A, GST-Aurora-B, or His-Aurora-C at 30°C for 30 min. Alternatively, 1 g of GST-Hec1 and GST-hsNuf2 was incubated with 1 g of each recombinant Aurora family kinase to perform the kinase reaction at 30°C for 10 min. The kinase reaction mixtures or cell lysates were resuspended in SDS sample buffer and separated using 12.5 or 8% SDS-polyacrylamide gels. Proteins were transferred to polyvinylidene difluoride membranes and detected using autoradiography or probed with 1:1000 dilution of anti-GFP, anti-Flag, or anti-HA antibody. The complexed IgGs were detected by incubation with secondary antibodies conjugated to horseradish peroxidase, and developed using the ECL system (Amersham Pharmacia Biotech, Piscataway, NJ).
Yeast Two-Hybrid Analysis-Standard techniques were used for the yeast two-hybrid system (26 -28). Briefly, each Aurora family gene and predicted candidate genes were cloned in frame with the GAL4 DNA binding domain (GAL4 BD) in the pGBT-9 vector or fused to the GAL4 activation domain (GAL4 AD) in the plasmid pACT-2 (MARCH-MAKER Two-Hybrid System; Clontech, Palo Alto, CA). The yeast strain Y190 was cotransformed with GAL4 BD and GAL4 AD. Positive clones were able to grow on Trp, Leu, and His dropout media supplemented with 3-aminotriazole (and an inhibitor of HIS3) and turn blue during the ␤-galactosidase filter assay. To determine ␤-galactosidase activity, we have adopted the procedure previously reported (29). Briefly, overnight-cultured yeast cells, adjusted to the same optical density, were collected and resuspended in 100 ml Tris, pH 7.4, and 0.05% Triton X-100. The resuspended cells were repeatedly frozen and thawed for five times at Ϫ80°C followed by addition of Z-buffer (150 mM phosphate buffer, pH 7, 10 mM KCl, and 1 mM MgSO 4 ) with O-nitrophenyl-␤-D-galactopyranoside as the substrate (4 mg/ml final concentration). The reactions were carried out at 30°C and stopped by addition of 250 mM sodium carbonate. The enzyme activity was determined by measuring the absorbance at a wavelength of 420 nm.

Constructing a Protein-Protein Interaction Network
Centered on the Aurora Yeast Homologue Ipl1-It is becoming increasingly apparent that protein interaction networks are extremely complex. Therefore, there is an increasing need to quickly elucidate where new, uncharacterized proteins interface with these networks. As a step toward developing such networks, we investigated the applicability of a bioinformatics system centered on the interactions of the Aurora kinase family. To facilitate a predictive understanding of the interactive biology of the human Aurora family kinases with respect to the various model organisms, the information generated from large-scale yeast protein-protein interaction databases and microarray studies was accessed to recreate proteinprotein interaction networks centered on Ipl1, the yeast homologue of the Aurora kinases. This proposed interaction network model was based upon the notion that some of the interacting proteins and/or substrates for the human Aurora family kinases may also be evolutionarily conserved, interacting with each other in the same temporal and spatial configurations. Construction of this model involved five bioinformatics steps, and the predicted interactions then verified empirically.
Step 1. Identification of Ipl1 Interacting Proteins from Published Small-scale Experimental Studies-The first step in the generation of the interaction model involved an extensive search of the literature to acquire the currently published experimental data for interaction networks centered on Ipl1. The keyword used in the literature searches was Ipl1. However, a functional network is not only limited to physical protein-protein interactions but also includes genetic and biochemical interactions. Thus, we combined all available genetic, biochemical, and physical interaction data centered on Ipl1. This data is summarized in Fig. 1, which shows that Ipl1 physically (the orange line) and genetically (8) interacts with Sli15 (19) and Dam1 (30) and phosphorylates Sli15 (30), Ndc10 (15), Dam1 (30), and Cin8 (9) (the pink line). Moreover, processes that are reversibly controlled by protein phosphorylation require not only a protein kinase, but also a protein phosphatase. Therefore, the biochemical interactions of the phosphatase (Glc7) and Ndc10 (31) (black line) were also included in our model.
Step 2. Establishment of the Protein-Protein Interaction Network by Analysis of Public Accessible Databases-Ipl1 interacting molecules, as summarized in Fig. 1, were used as templates for reconstructing the interaction networks by accessing the different yeast protein-protein interaction databases (data collections at dip.doe-mbi.ucla.edu and portal-.curagen.com). These include small-scale (orange line) and high-throughput (blue dashed line) yeast two-hybrid experiments. To date, there are four different comprehensive largescale yeast protein-protein interaction databases, which, despite discrepancies among the different reports (32), have provided ϳ6000 uncharacterized interactions (1)(2)(3)(4). To broadly cover potential candidates in our study, we have included three steps of radial network expansion with Ipl1 at the center ( Fig. 2A). The search results showed that there are 192 known yeast proteins in this protein-protein interaction radial network (partially illustrated in Fig. 2A).
Step 3. Protein-Protein Interaction in the Proper Spatial Configuration-Because intracellular events may be compartmentalized to unique intracellular locations, to provide additional specificity for target selection we also included a spatial component to further refine the construction of the model. Because the Aurora variants are localized to the mitotic apparatus in organisms ranging from yeast (9) to human (7), association with the mitotic spindle or kinetochore became a prerequisite for the selection of potential Ipl1 interacting proteins. Therefore, we further prioritized our target selections by using two spatial keywords (spindle pole and kinetochore) FIG. 1. The biochemical and physical interaction networks for the yeast protein Ipl1, a homologue of the human Aurora serine/ threonine family kinases. This figure is a summary of the results of the literature search for Ipl1 substrates and cooperators, which are the basis of the protein-protein interaction networks. Ipl1 phosphorylates Sli15, Cin8, Ndc10, and Dam1 (pink lines with balls labeled P, which stands for phosphorylation). Glc7 serves as a phosphatase counteracting Ipl1 function and dephosphorylates Ndc10 (black line). Physical interactions between two proteins are illustrated by orange lines.
FIG. 2. In silico conversion from the yeast protein-protein interaction databases to human interaction networks with specific compartmentalization. A, Protein-protein interaction networks based on spatial relationships. To provide maximal coverage of the potential and searched for the information available at genome-www. stanford.edu/Saccharomyces/ and www.proteome.com/databases/YPD/. Thirty-three out of the 192 proteins (ϳ20%) were left in the map (balls containing protein names; Fig. 2A). Of these 33 proteins, four complexes were identified: the inner kinetochore (CBF3 complex), the central kinetochore (Ndc80 complex), the outer kinetochore (Dam1 complex; for review see Ref. 16) (circled by green dashed line; Fig. 2A), and the spindle-pole (␥-tubulin complex; Ref. 33) (circled by red dashed line; Fig. 2A). This analysis raises the possibility that Ipl1 may regulate these complexes as a whole rather than regulate one or more of the individual components.
Step 4. Conversion from Yeast Interaction Networks to Human Homologues-To establish the interaction networks for the mammalian system, selected candidates were subsequently converted into higher-organism proteins based on their sequence and their functional homologies in different model organisms such as C. elegans (www.wormbase.org) and Drosophila (flybase.bio.indiana.edu). However, the method for searching orthologous interactions in other species depends on the interactions represented in yeast and would therefore miss interactions unique to the other species. Comprehensive analysis of these 33 proteins ( Fig. 2A) for different model organisms is not shown; however, the yeastinteracting protein networks and their human counterparts (assignment based either on searches at www.ncbi.nlm.nih-.gov/BLAST, mips.gsf.de/genre/proj/yeast/, www.yeastgenome.org/, or a review of the literature) are presented in Fig. 2B. The search results indicate that only 15 putative Aurora-interacting proteins could be identified with definite homologues in these model organisms. Eighteen yeast proteins, including Dam1 and Ndc10 (Fig. 2, A and B; balls containing question marks), have, to the best of our knowledge, no human counterparts. Of the 15 interacting proteins, INCENP (25), survivin (24,25), hsEg5 (34), and TACC3 (35) have been previously reported to be related to mitosis and to interact with members of the Aurora family. No significant correlations with the Aurora family were demonstrated for the remainder, suggesting that this search has identified several proteins that are potentially novel Aurora-interacting partners.
Step 5. Cluster Analysis of the HeLa Cell Cycle Microarray Database-Aurora family members are highly expressed, both transcriptionally and translationally, during mitosis. This raises the possibility that the interacting proteins and/or substrates may exhibit similar expression profiles during cell division. In other words, gene expression clusters may correspond to the same functional categories (36 -41) and may be used to aid in target selection. This prompted us to assess whether it was possible to correlate the gene expression profiles of the human Aurora family members with those of analogues that exhibit similar expression profiles using publicly accessible proprietary microarray databases, where download was technically feasible. Thus, we focused on a cell cycle progression (temporal segregation) database that provides substantial descriptions of transcriptomes, allowing analysis of the periodically regulated genes within the HeLa cell cycle (genomewww.stanford.edu/Human-CellCycle/HeLa; Ref. 42). Of the human Aurora variants, the cDNAs of Aurora-A (serine/threonine kinase 15) and Aurora-B (serine/threonine kinase 12) are present in this database, and it was searched for genes whose expression profiles are similar to either Aurora-A/B by applying the clustering algorithm (37) with Pearson's correlation. The search result yielded 50 clones with a similar expression pattern to either Aurora-A/B (supplementary Fig. 1). After filtering out repetitive clones, a literature search was conducted to identify those genes that have distinct subcellular localization and homologues in the different model organisms (clones with yeast homologues are illustrated in supplementary Table I, middle and right columns). Of all the potential candidates listed in supplementary Table I, four genes, TACC3, survivin, hsNuf2, and Hec1, were consistently identified in all searches (Fig. 2, and supplementary Table I highlighted with yellow boxes), with TACC3 and hsNuf2 exhibiting similar cell cycle gene expression patterns to both Aurora-A/B, while the gene expression patterns of survivin and Hec1 were similar to Aurora-A and Aurora-B, respectively. Furthermore, TACC3 (21) and survivin (24,25) have been shown to interact biochemically and/or genetically with Aurora family members, further demonstrating that our search methods not only identify existing, known targets, but, more importantly, also putative novel interacting partners.
Assignment of Spc72 as the Yeast Homologue of TACC-Analysis of our proposed global interaction networks revealed one new yeast protein, Spc72, that shares similar characterinteractome, three expansion steps were centered on Ipl1. For example, the first expansion step is from Ipl1 to Cin8. Comprehensive database analyses revealed 192 proteins as potential interaction candidates with radial networks centered on Ipl1. Large-and small-scale protein-protein interaction data are labeled in blue dashed lines and orange lines, respectively. These candidate proteins were further prioritized using searches (genome-www.stanford.edu/Saccharomyces/ and www.proteome.com/YPD) based on two spatial keywords (spindle pole and kinetochore), and 33 yeast proteins were selected on the basis of their unique localization characteristics during mitosis. Further literature reviews identified the proteins in the spindle pole body complex (circled by red dashed line) and in the kinetochore complex (circled by green dashed line). Balls with numbers only (protein numbers) indicate proteins that do not meet the compartmentalization criteria, whereas balls with names represent proteins localized in either the spindle pole or kinetochore. B, Transformation from yeast genes to human homologues. Fifteen yeast proteins (balls with protein names) have homologues in the human genome, as determined by identical localization and similarity in sequences and functions. These known proteins, which are considered to have conserved human counterparts, are shown in the interaction networks (also see supplementary Table I). To the best of our knowledge, and based upon searches of the literature and sequence databases, human homologues have not been identified for the other 18 proteins (balls with question marks).
istics to human TACC proteins. For instance, both Spc72 and TACC require another microtubule-binding protein, Stu2 (yeast; Fig. 2A) and CH-TOG (human; Fig. 2B), respectively, to associate with mitotic spindles (43,44). In addition, Spc72 and the TACC family members are acidic (pI ϭ 4.9), and all contain a relatively conserved coiled-coil domain. Given this evidence, a sequence alignment for Spc72, amphibian maskin (45), fly d-tacc (46), and TACC family members was carried out using the CLUSTAL W program (47), and the result further suggests that Spc72 and TACC members have a similar TACC domain (35), with amino acid similarities ranging from 36 to 43% in various model organisms (Fig. 3). Similar speculation has been proposed that proteins that share a similar carboxyl-terminal motif "INCENP box," i.e. Sli15 (yeast), ICP-2 (C. elegans), and INCENP (human, Drosophila), are homologues ranging from yeasts to vertebrates (18). Thus, we believe that in yeast, Spc72 may be a functional homologue of the TACC proteins.
Demonstration of Biochemical Interactions Between Aurora Family Members and Molecules Identified in Silico Using Empirical Wet-Lab Techniques-Verifying the interactions from high-throughput methods is vital to provide a confident interaction network useful for further study. With the exception of survivin, which has been characterized extensively (24,25), the three prioritized candidates were next tested empirically to confirm the predicted protein-protein interactions. This was carried out using yeast two-hybrid analysis and co-immunoprecipitation and potential enzyme-substrate pairs, tested by ␥-32 P incorporation, with each Aurora variant. For the in vitro kinase assay, Histone H1, Histone H2A, myelin basic protein, and P16 (48) were also incubated with the Aurora kinases and were used as positive controls to ensure that the input of Aurora-A kinase activity was similar to that of Aurora-B and Aurora-C was active (see Fig. 5).
Previously, it has been shown that Drosophila Aurora A and D-TACC coimmunoprecipitate and that human Aurora-A forms a complex with TACC3 in tissue culture cells (21). However, it is not clear whether all Aurora family members would phosphorylate or physically associate with each TACC family member. Thus, the detailed interactions between the two protein families were further examined. In the yeast twohybrid assay, distinct TACC family members associate with Aurora family members with different affinities, as judged by the induced ␤-galactosidase levels (Fig. 4). Only TACC1 and TACC3 showed a positive interaction with Aurora-A, while only TACC2 interacted with Aurora-C (Fig. 4). Moreover, all TACC family members were not a substrate for Aurora-B/C, despite the fact that both TACC2 (35) and Aurora-C (14) can be found in the centrosome. In contrast, TACC3, but not TACC1 and TACC2, could be phosphorylated by Aurora-A (Fig. 5) even though both TACC1 (35) and Aurora-A (13) can associate with the centrosome. Together, the data suggest a selective involvement of Aurora and TACC family members in this interaction network.
Next, we examined the relationship between the Aurora family, Hec1, and hsNuf2. The physical association of Hec1 and hsNuf2 was confirmed by the yeast two-hybrid assay, whereas Hec1 and hsNuf2 did not associate with any Aurora family member (Fig. 4). However, it has been shown previously that the yeast homologues of Hec1 and hsNuf2, Tid3, and Nuf2 form a protein complex in yeast called Ndc80p complex, which consists of Tid3, Nuf2, Spc24, and Spc25 ( Fig. 2A) (49). This previous report raises the possibility that the yeast two-hybrid assay only records interactions between pairs of proteins and can miss those weaker interactions allosterically stabilized by additional binding partners. We subsequently tested whether Aurora family members might form a complex with Hec1 or hsNuf2 in tissue culture cells. The coimmunoprecipitation assay performed in 293T cells showed that Aurora-B, but not Aurora-A, formed a complex with Hec1 and hsNuf2, respectively (Fig. 6). The discrepancy implies that only Aurora-B might physically associate with a multiprotein complex, similar to yeast Ndc80p complex, in human cells. Finally, the in vitro kinase assay indicated that both Hec1 and hsNuf2 could serve as very weak substrates of Aurora-A and -B as judged by the ϳ10 times longer x-ray exposure time than other substrates tested in Fig. 5 (see later in "Discussion"). Taken together, the biochemical data not only confirmed the findings reported earlier (21), but also revealed the existence of novel interactions, i.e. Hec1 and hsNuf2. Although substantially more work is required to delineate the enzymological, structural, and specificity determinants of Aurora family-mediated phosphorylation with respect to the regulation of their cellular function, the findings reported here suggest that our search method may be useful for finding interacting proteins and possibly substrates for any evolutionarily conserved molecule.
Finally, in order to reduce the complexity and the time needed to construct the predicted human protein-protein interaction database, we have developed an algorithm to integrate the different publicly accessible databases described earlier. The search algorithm can automatically perform the BLAST analysis and then provide the existing yeast and predicted human protein-protein interaction map. 2 The database, named POINT (prediction of interactome), can be accessed via insilicogenomics.nhri.org.tw:8080/POINT, where users can set different criteria to deduce the predicted interactome (example in supplementary Fig. 2) for their protein of interest. It is anticipated that the building of such an integrated platform, which can be constantly up-graded, could provide a predictive understanding of a novel gene's function in its biological context.

DISCUSSION
The explosion of data generated by large-scale genomicsrelated technologies has resulted in an exponential increase in our understanding of biology. In this manuscript, we have demonstrated how bioinformatics can supplement conventional biological investigation. Our protein-protein interaction model for the Aurora homologues takes advantage of the availability of the functional databases for different model organisms as well as microarray expression data. It has incorporated the ideas of sequence and functional conservation and spatial and temporal segregation to elucidate the possi- FIG. 4. Summary of yeast two-hybrid analysis. Aurora-A, Aurora-B, Aurora-C, hsNuf2, and Hec1 were subcloned into the GAL4 DNA binding domain (pGBT9 vector). The identified potential interacting proteins and three Aurora family members were subcloned into the GAL4 DNA activation domain in the pACT2 vector. Both plasmids were transformed into the yeast strain Y190 and plated on SD/-Leu/-Trp/-His to select for co-transformants. The positive clones grew on Trp, Leu, and His dropout media supplemented with 3-aminotriazole and turned blue in ␤-galactosidase filter assay. The ␤-galactosidase activity was quantified using a liquid assay. ϩ, positive interaction; Ϯ, weak interaction; -, no interaction; x, lethal. ble function of the selected interacting proteins and/or substrates for Aurora kinases. The four prioritized Aurora-interacting proteins, identified from this study, play important roles in the mitotic processes, such as chromosome segregation and bipolar spindle assembly, suggesting that the newly identified molecules may be useful for the further elucidation of the role of the Aurora family kinases in mitosis.
The criteria used for accepting components for the networks (or localization filter) is not randomly or biased selected. We chose two spatial keywords (spindle pole and kineto-chore) from the cellular component as described in Gene Ontology annotation (www.geneontology.org) (50). However, search keywords may be adapted to accommodate any suitable criteria, such as centrosome, centromere, or checkpoint, to comprehensively search current annotations. Interfaces provided at several web sites (such as genome-www.stanford.edu/Saccharomyces/, www.wormbase.org, www.proteome.com/databases/YPD/, bioinfo.weizmann.ac.il/cards/, and www.geneontology.org) make it possible to search for the subcellular localization of a particular gene and retrieve information derived from different model organisms. The importance of this feature is self-evident in that intracellular events may be spatially and temporally compartmentalized, and this may affect the biochemical and cellular functions of proteins such as Aurora. Integration of this concept has resulted in the identification of CH-TOG and GCP3 (human) in our predicted interaction map (Fig. 2B). The homologues of these two centrosomal proteins in C. elegans (ZYG-9 and CeCrip) are mis-FIG. 5. Aurora-A and/or Aurora-B can phosphorylate TACC3, Hec1, and hsNuf2. Recombinant Aurora-A, Aurora-B, and Aurora-C were incubated with TACC1, TACC2, TACC3, hsNuf2, and Hec1 in the presence of [␥-32 P]ATP to test phosphorylation of these potential substrates by the Aurora family kinases. Analysis of the autoradiography data demonstrated that Aurora-A could efficiently phosphorylate TACC3, but no phosphorylation was observed for Aurora-B and Aurora-C. In contrast, both Hec1 and hsNuf2 (arrow) could be phosphorylated by Aurora-A and -B. A star indicates autophosphorylation of Aurora-A, which ran very close to hsNuf2 on the SDS-PAGE. Histone H1, Histone H2A, MBP, and P16 were also incubated with the Aurora kinases, respectively, to serve as substrate controls to ensure that the input of Aurora-A kinase activity was similar to that of Aurora-B and Aurora-C was active. One special note is that both Hec1 and hsNuf2 could serve as very weak substrates of Aurora-A and -B as judged by the ϳ10 times longer x-ray exposure time than other substrates tested.  2 and 4), respectively; whereas Flag-Aurora-A was not (lanes 1 and 3). Negative control was performed in lanes 5 and 6, indicating that anti-HA antibody alone was not coimmunoprecipitated with Flag-Aurora-A or -B. IP, immunoprecipitation; IB, immunoblotting. localized when AIR-1 (the homologue of Aurora-A) (20) is knocked down by RNA interference, suggesting these molecules are linked in the same functional network. Intriguingly, ZYG-9 has recently been shown to interact with the proposed TACC family homologue in C. elegans (TAC-1), further confirming the evolutionary conservation of this network (51)(52)(53).
In this manuscript, we have tested four out of nine targets identified from n ϭ 2 (two-step expansion) (Fig. 2B) and confirmed that all four targets are either a substrate for or interact with Aurora family members. However, we did not test any targets identified from n ϭ 3 (three-step expansion) and cannot determine the false-positive rates (the number of identified targets, i.e. n ϭ 3, that are true targets). Similarly, our model cannot accurately determine the false-negative rates (the number of known Aurora targets that were not obtained in the bioinformatics analysis) because of a lack of adequate gene annotation information that clearly assigns yeast homologues to the identified human targets, and vice versa (Fig. 2). For example, several interaction proteins for Aurora family members have recently been identified, i.e. MBD3 (54), TPX2 (55), LIM protein Ajuba (56), and RasGAP (24). However, the first two proteins do not have yeast homologues as analyzed by www.ncbi.nlm.nih.gov/HomoloGene/ and bioinfo.weizmann.ac.il/cards/. In contrast, LIM protein Ajuba and RasGAP have yeast homologues, Lrg1 and Bud2, respectively, but these two yeast homologues are not in our interaction network. Even if these two yeast proteins are in the interaction network, they will be eliminated by our two spatial keywords (spindle pole and kinetochore), because they are localized in the cytosol, and highlight the need for giving careful consideration of the keywords to be used in database analysis. Moreover, during the preparation of this manuscript, three additional Aurora substrates, Ndc80, Ask1, and Spc34, were identified in yeast (57). Although these molecules were not used in our initial Ipl1 interaction model (Fig. 1), they are all identified in the protein-protein interaction map centered on Ipl1 ( Fig. 2A). This further supports the general applicability of our methodology.
It has been suggested that interacting partners can be found within the same gene expression cluster, and thus genes with similar expression patterns are generally believed to belong to the same functional category (36 -41). Thus, applying cluster analysis (37) to the mammalian HeLa cell cycle microarray database can reveal further insights into comparative functional genomics. However, it is not always possible to employ similarity of gene expression profiles as an initial predictor of protein-protein interactions. For example, there are 50 genes sharing similar gene expression patterns with Aurora-A/B in the HeLa cell cycle microarray database (42), and it is unlikely that all 50 genes are Aurora-interacting proteins. Therefore, to further improve our model, correlation of the functional category (the temporal separation) based on the gene expression clusters from the existing HeLa cell cycle microarray database (42) was only incorporated at a later stage (step 5). Clustered genes, which were co-expressed with Aurora-A/B at specific stages of cell cycle, were identified to further refine/finalize the human protein-protein interaction networks. It is thus anticipated that incorporation of the expanding global surveys of gene expression will make it possible to identify and refine a selected set of genes, which in turn may provide novel insights into the molecular mechanisms underlying the function of cell cycle-regulated genes such as Aurora.
The most important test of our in silico observations was to confirm the protein-protein interactions using standard biochemical techniques. Of particular significance is the interaction between Aurora-A and TACC3 (21) and survivin (24,25). Three recent reports (21,24,25) support our in silico predictions. Moreover, the yeast homologues of Hec1 and hsNuf2, Tid3 and Nuf2, form a protein complex in yeast called Ndc80p complex, which consists of Tid3, Nuf2, Spc24, and Spc25 ( Fig. 2A) and is involved in chromosome segregation (49). The human Hec1 and hsNuf2 also form a complex, as illustrated by yeast two-hybrid assay (Fig. 4). Despite the fact that both Hec1 and hsNuf2 are much better substrates for Aurora-A than -B, Hec1 and hsNuf2 are associated with Aurora-B, but not Aurora-A, in tissue culture cells (Fig. 6). There are many possible reasons for the noticeable disagreement. For example, transient phosphorylation would not necessarily be observed in a co-immunoprecipitation assay unless there is an additional protein-binding domain on the substrate. Alternatively, the positive in vitro phosphorylation data may simply reflect that Aurora-A and -B recognize similar substrate specificity determinants in vitro, but that additional intracellular factors determine whether phosphorylation occurs in vivo. Therefore, it is important to evaluate whether the detected interactions correlate with, for example, distinct subcellular localizations. Because Aurora-A localizes in the spindle pole, while, similar to Aurora-B, Hec1 and hsNuf2 colocalize to the kinetochore in cultured human cell lines (49) during mitosis, the biochemical and physical interactions of Aurora-B, Hec1, and hsNuf2 might be more biologically relevant. In conclusion, therefore, our search algorithm has led to the novel insight that Aurora-B kinase might regulate the Hec1/hsNuf2 protein complex in organisms ranging from yeast to human.
There are several databases currently available describing or collecting the intermolecular protein-protein interactions (58 -62). For example, BIND (58), DIP (59) and STRING (62) have extensive collections of human protein-protein interactions. The former two databases are primarily used to extract, but not to predict, protein-protein interaction data from literature. Although the STRING database could predict interactions between proteins (62), this database focuses on identifying the neighboring genes in the genomic text and does not include any experimental protein-protein interactions. Significantly, we have developed the POINT database to provide novel insights into protein-protein interaction networks in combination with publicly accessible microarray databases.
The POINT database will incorporate more databases, particularly from different model organisms, as they become available. Thus, the resulting interaction model will therefore not only include valuable parameters allowing mimicry of the potential interactions within the living cell, but also may facilitate a predictive understanding of protein-protein interaction networks and provide guidance for target prioritization. As the genomic era continues to unfold, fostered by the enormous increases in genetic and molecular data, this bioinformatics model will continue to evolve and integrate new empirical data as it becomes available. This will lead to the development of a tool to predict the protein interactome of higher eukaryotes, with particular focus on the human genome.