Differential Expression of Novel Tyrosine Kinase Substrates during Breast Cancer Development *S

To identify novel tyrosine kinase substrates that have never been implicated in cancer, we studied the phosphoproteomic changes in the MCF10AT model of breast cancer progression using a combination of phosphotyrosyl affinity enrichment, iTRAQ™ technology, and LC-MS/MS. Using complementary MALDI- and ESI-based mass spectrometry, 57 unique proteins comprising tyrosine kinases, phosphatases, and other signaling proteins were detected to undergo differential phosphorylation during disease progression. Seven of these proteins (SPAG9, Toll-interacting protein (TOLLIP), WBP2, NSFL1C, SLC4A7, CYFIP1, and RPS2) were validated to be novel tyrosine kinase substrates. SPAG9, TOLLIP, WBP2, and NSFL1C were further proven to be authentic targets of epidermal growth factor signaling and Iressa (gefitinib). A closer examination revealed that the expression of SLC4A7, a bicarbonate transporter, was down-regulated in 64% of the 25 matched normal and tumor clinical samples. The expression of TOLLIP in clinical breast cancers was heterogeneous with 25% showing higher expression in tumor compared with normal tissues and 35% showing the reverse trend. Preliminary studies on SPAG9, on the other hand, did not show differential expression between normal and diseased states. This is the first time SLC4A7 and TOLLIP have been discovered as novel tyrosine kinase substrates that are also associated with human cancer development. Future molecular and functional studies will provide novel insights into the roles of TOLLIP and SLC4A7 in the molecular etiology of breast cancer.

base of genetic aberrations during disease initiation, maintenance, and progression. However, disease progression is practically impossible to study in an individual because of physicians' intervention. Although the use of clinical samples from patients in different stages of cancer is one option, differences in patients' genetic and environmental backgrounds complicate analysis. Consequently experimental breast cancer models are valuable and have been successfully used to identify molecular events during disease development (1)(2)(3). The MCF10AT model comprises a series of isogenic, xenograft-derived cell lines that mimic the different stages of breast cancer progression (4,5) and has been characterized in terms of cytogenetics, apoptosis, transforming growth factor-␤ signaling, and proteomics (6 -12).
However, one major deficiency of most proteomics tools is their inability to detect low abundance proteins. Phosphoproteomics on the other hand offers several advantages. (i) It is a form of subproteomics that allows the detection of low abundance signaling proteins that would otherwise be masked by other major proteins in the whole proteome approach. (ii) Unlike DNA microarray, it is an open system that allows the discovery of novel phosphoproteins in the system studied. In turn, further studies would contribute to our understanding of the roles of these novel tyrosine kinase substrates in cancer biology. (iii) Because phosphoproteomics focuses on kinases and/or their substrates, it may have implication on target-directed therapeutics either on discovery or in understanding the molecular epidemiology of drug targets. For these reasons, we are interested to exploit phosphoproteomics for studying disease progression. This has never been done before.
We have previously developed two-dimensional PAGE and cleavable ICAT-based phosphoproteomics to dissect signaling pathways and map the differential phosphoprotein contents of cancer cell lines and breast tumors, respectively (13)(14)(15). In this study, we adapted a relatively newer isobaric tags for relative and absolute quantification (iTRAQ TM ) 1 tech-nology to profile the tyrosine phosphorylation level of proteins in the MCF10AT model of breast cancer progression. The aims of this study were to identify novel tyrosine kinase substrates that have never been implicated in cancer development and that displayed differential expression levels in the MCF10AT model. These candidates were subsequently investigated for their involvement in tumor development in the clinical setting. and the other portion was retained for subsequent validation. For the former purpose, 60 mg of total protein from each cell line were used for purification via 4G10 anti-phosphotyrosine antibody-based immunoaffinity purification. This was performed as previously reported except that the enriched phosphoproteins were eluted with a buffer that was PBS-based rather than Tris-based because the latter is incompatible with iTRAQ reagents (14). After the enriched phosphoprotein contents of four samples were determined, they were then denatured, and the cysteines were blocked as described in the iTRAQ protocol (Applied Biosystems, Foster City, CA). Each sample was then digested with trypsin provided in the reagent kit at 37°C overnight (16 h) and labeled with the iTRAQ tags as follows: A1, 114 tag; 1k, 115 tag; 1h, 116 tag; and 1a, 117 tag. As the pervanadate (PV)-induced phosphoproteome from different cell lines had different amounts of proteins enriched ranging from 50 to 150 g, each sample was treated with two vials of the same isotopic tag to ensure complete labeling. The samples were then pooled and cleaned up by the cation exchange cartridge provided in the kit. The samples were desalted, lyophilized, and analyzed using ESI-LC/MS/MS and MALDI-TOF-TOF.
On-line ESI-LC/MS/MS was performed using a QSTAR-XL hybrid quadrupole time-of-flight tandem mass spectrometer (Applied Biosystems) coupled to an LC Packings (Dionex, Sunnyvale, CA) LC system comprising a FAMOS autoinjector unit, a Switchos 10-port valve unit, and an UltiMate PLUS nanoflow pumping unit. Sample was injected into a reversed-phase trapping C 18 peptide trapping cartridge (300 m ϫ 5 mm, LC Packings) in a flow of 0.1% formic acid for 5 min at 25 l/min. Following the wash step the flow from the UltiMate PLUS was diverted back through the trapping cartridge at 100 nl/min using the Switchos. Peptides were eluted from the cartridge by application of a gradient from 0 to 90% acetonitrile in 0.1% formic acid over 40 min at 100 nl/min and separated by passing through a column that was packed in house and consisted of a 75-m ϫ 10-cm packed volume of 5-m C 18 reversed-phase packing (Column Engineering, Ontario, Canada). Peptides eluting from the column were sprayed directly into the orifice of the mass spectrometer, which was run in information-dependent acquisition mode selecting all 2ϩ to 4ϩ charged ions with signal intensity greater than 8 counts/s over the mass range of 300 -2000 amu. For CID nitrogen gas was used at a setting of 4, and the collision energy was set to automatic, allowing increased energy with increasing ion mass. Each fraction aliquot that was run was searched against the human subset of the Internation Protein Index (IPI) protein database (European Bioinformatics Institute, Hinxton, UK) using the Mascot (Matrix Science, London, UK) search engine. From the search results a mass exclusion list was generated for each fraction based on the peptide masses of all matching peptides or up to a maximum of 2000 masses. These masses were then excluded from MS/MS fragmentation when the remaining aliquot of each fraction was rerun. For protein identification and quantification, the complete set of data files (*.wiff) was analyzed together using ProteinPilot TM software version 2.0 (Applied Biosystems) and searched against the human subset of the UniProt protein database (release 11, downloaded in May 2007 with a total 267,354 entries and 16,602 human entries; European Bioinformatics Institute) using the Paragon TM algorithm. The search parameters allowed for cysteine modification by methyl methanethiosulfonate, phosphorylation emphasis, and biological modifications programmed in the algorithm. The detected protein threshold (unused protscore (conf)) in the software was set to 1.3 to achieve 95% confidence. The bias correction option was not executed because this study concerned chemically induced protein phosphorylation that can take place to different extents in various cells.
For MALDI analysis, the iTRAQ labeled peptide mixture was separated using an UltiMate TM LC system (Dionex-LC Packings) equipped with a Probot TM MALDI spotting device. The iTRAQ peptide mixture was first captured with a 0.3 ϫ 1-mm trap column (3-m C 18 PepMap TM , 100 Å) (Dionex-LC Packings) and washed with 0.05% TFA followed by gradient separation with a 0.2 ϫ 50-mm reversedphase column (Monolithic PS-DVB) (Dionex-LC Packings). The mobile phase A and mobile phase B used were 98% H 2 O, 2% ACN with 0.05% TFA and 80% H 2 O, 20% ACN, and 0.04% TFA, respectively. The gradient elution step was 0 -60% mobile phase B in 20 min with a flow rate of 2.7 l/min. The LC fractions were mixed with MALDI matrix solution (7 mg/ml ␣-cyano-4-hydroxycinnamic acid and 130 g/ml ammonium citrate in 75% ACN) in a flow rate of 5.4 l/min through a 25-nl mixing tee (Upchurch Scientific) before spotting onto 192-well stainless steel MALDI target plates (Applied Biosystems) using a Probot microfraction collector (Dionex-LC Packings) with a speed of 5 s/well. MALDI target plates were analyzed using an ABI 4700 Proteomics Analyzer MALDI-TOF/TOF mass spectrometer (Applied Biosystems) operating in a result-independent acquisition mode. Typically 1000 shots were accumulated for each well of sample. MS/MS analyses were performed using nitrogen at collision energy of 1 kV and a collision gas pressure of ϳ1 ϫ 10 Ϫ6 torr. For precursor ions with signal to noise ratio greater or equal to 100, 6000 shots were combined for each spectrum. For the precursors with signal to noise ratio between 50 and 100, 10,000 shots were accumulated. GPS Explorer TM software version 3.5 (Applied Biosystems) was used to create and search files with the Mascot search engine (version 2.1; Matrix Science) for peptide and protein identifications. IPI human database (version 3.14, released January 2006, 57,366 sequences) was used for the search and was restricted to tryptic peptides. Cysteine methanethiolation; N-terminal iTRAQ labeling; iTRAQ labeled lysine; serine, threonine, and tyrosine phosphorylation; and methionine oxidation were selected as variable modifications. One missing cleavage was allowed. Precursor error tolerance was set to 100 ppm, and MS/MS fragment error tolerance was set to 0.3 Da. Maximum peptide rank was set to 2, and minimum ion score confidence interval (peptide) was set to 95%. The iTRAQ quantification was performed using GSP Explorer software version 3.5. The normalization option was not executed because this study concerned chemically induced protein phosphorylation that can take place to different extents in various cells.
Low Density Real Time PCR Array-Cells were grown in a 10-cm 2 tissue culture dish until 80 -90% confluence (ϳ1 ϫ 10 7 exponentially dividing cells). Total RNA for each cell line was extracted using the Purescript RNA isolation kit (Gentra Qiagen, West Sussex, UK) according to the manufacturer's protocol. For first strand cDNA synthesis, 2 g of RNA were reverse transcribed using the following conditions: 1ϫ avian myeloblastosis virus reverse transcriptase reaction buffer (Promega), 30 units of avian myeloblastosis virus reverse transcriptase (Promega), 40 units RNasinா ribonuclease inhibitor (Promega), 1 mM dNTP mixture (Invitrogen), 1 g of oligo(dT) (Invitrogen). Each RT-PCR was performed in triplicates and pooled. The integrity of the pooled cDNA was assessed by PCR amplification with a human control gene, glyceraldehyde-3-phosphate dehydrogenase. Negative control (without cDNA template) was also included to check for contaminating cDNA and genomic DNA. Evaluation of the gene expression profile of the MCF10AT model was performed using the Taq-Manா Gene Expression Assay and gene-specific primers for NSF1LC (Hs00739840_mH), WBP2 (Hs00600857_m1), and glyceraldehyde-3phosphate dehydrogenase (HS99999905_m1) from Applied Biosystems. Sample-specific PCR mixture was loaded in quadruplicates onto the microplate at 1 g of cDNA/well and analyzed using the ABI 7900HT system. Quantification of gene expression relative to the "normal" cell line was then done using the comparative C T method as described in the manufacturer's manual.
Immunohistochemistry-Frozen tissues from National University Hospital and Singapore General Hospital were freshly prepared for IHC by fixing in 10% neutral buffered formalin (Sigma) for 16 h at 4°C, subjecting to a ThermoShandon tissue processor, and embedding in paraffin. Sections were warmed in a 60°C oven, dewaxed in three changes of xylene, and passaged through graded ethanol (100, 95, and 70%) before a final wash in double distilled H 2 O. For TOLLIP (Abgent), (Abcam), and SPAG9 (Abcam), antigen retrieval was performed via pressure cooking at 121°C for 5 min in Tris-EDTA buffer, pH 9.0. For SLC4A7 (Chemicon International Inc.), antigen retrieval was performed using the Target Retrieval Solution (DakoCytomation, Glostrup, Denmark) at 95°C for 40 min. After quenching of endogenous peroxidase activity with 3% H 2 O 2 for 10 min and blocking with BSA for 30 min, sections were incubated at 4°C overnight with antibodies against TOLLIP at a 1:50 dilution, at a 1:50 dilution, SPAG9 at a 1:100 dilution, and SLC4A7 at a 1:50 dilution. Detection was achieved with the Envisionϩ/horseradish peroxidase system (Dako-Cytomation). All slides were counterstained with Gill's hematoxylin for 1 min, dehydrated, and mounted for light microscopic evaluation. Interpretation of hematoxylin and eosin sections and analysis/scoring of IHC data were all done by the same certified pathologist to maintain consistency.

Detection and Relative Quantification of Pervanadate-induced Tyrosine Phosphorylated Proteins in MCF10AT Cell
Line Model of Breast Cancer Progression-The experimental design used in this study to detect differentially tyrosine phosphorylated proteins in the MCF10AT model is shown in Fig. 1A. The cell lines in the MCF10AT model used in this study included MCF10A1, which is modeled after normal epithelium, and MCF10AT1K.cl2, MCF10CA1h, and MCF10CA1a.cl1, which are modeled after premalignant epithelium and low grade and high grade lesions, respectively (11). They are abbreviated as A1, 1k, 1h, and 1a in this study. The tumors obtained by subcutaneous injection of 1k, 1h, and 1a cancer cells into nude mice (note that A1 is not tumorigenic) grew at different rates and were of different grades as assessed by a pathologist validating that the model was indeed reflective of disease progression ( Fig. 1B and Supplemental Fig. 1). To prepare cells for phosphoproteomics analysis, cells were serum-starved overnight and treated for 15 min with 1 mM PV, a potent tyrosine phosphatase inhibitor, to enhance the presentation of tyrosine phosphorylated proteins. The lysates of the MCF10AT cells untreated or stimulated with PV were then probed with anti-phosphotyrosine antibodies to reveal the overall cellular tyrosine phosphorylation profiles (Fig.  1C, top panel). The intensities of a few proteins (marked by arrows) were stronger in 1k and 1h compared with A1 after PV stimulation. The amount of tyrosine phosphorylated proteins in 1a cells was remarkably low, suggesting that it is deficient in one or more tyrosine kinases. The phosphoproteins were then affinity-captured using 4G10 anti-phosphotyrosine antibodies, trypsin-digested, labeled with iTRAQ reagents (17), and analyzed using tandem mass spectrometry to determine the relative phosphoprotein levels in the MCF10AT cells. To increase the coverage of protein identifications and/or the confidence of the data generated, two separate preparations were made, and each was analyzed by ESI-LC/MS/MS and MALDI-TOF-TOF. The cells A1, 1k, 1h, and 1a were labeled with iTRAQ reagents 114, 115, 116, and 117, respectively. The ratios 115:114, 116: 114, and 117:114 would indicate the relative abundance of a potentially tyrosine phosphorylated protein in 1k, 1h, and 1a with respect to A1. Table I shows the detection (gene symbol, column 2; protein name, column 3) and relative quantity (columns 6 -8) of 4G10 antibody-enriched proteins in the various cell lines. The S.D. and error factor (EF) shown in columns 9 -11 as confidence indicator for relative quantification were provided by ProteinPilot software for ESI-based data and by the GPS Explorer software for the MALDI-based dataset. Additional information such as the peptide sequence, m/z value, and sites of iTRAQ modification of proteins detected via single peptide assignment are provided in Supplemental Table 1. The protein detection (including group reporting) and quantification data from MALDI-and ESI-based LC-MS/MS analyses from which the information in Table I was derived are  provided as Supplemental Table 2 (A and B) and Supplemental Table 3, respectively. Not all the proteins identified by the MALDI platform (Supplemental Table 2B) were included in Table I. Only those protein hits that had a minimal best/total ion score of 95% were selected to maintain a high confidence. For the ESI/ProteinPilot-based method, the protein threshold (unused protscore (conf)) was set to 1.3 to achieve 95% confidence. This generated Supplemental Table 3 where all the hits were of 95% confidence. Hence they were all included in Table I. In Supplemental Tables 2 and 3, protein hits were either reported as unique or as a group of proteins/isoforms (when the peptide sequence(s) did not have sufficient discriminatory power). In the latter case, a protein member, usually the one with the highest rank/score in the database search, would be listed in tables as a representative of that group.
To determine cutoff values to classify proteins as differentially expressed, we calculated the median S.D. for MALDIbased data and median EF for ESI-based data. The median worked out to be 0.27 and 1.2 for S.D. and EF, respectively. The upper (1.27 (for S.D.) and 1.2 (for EF)) and lower (0.73 (for S.D.) and 0.83 (for EF)) range were then applied to the MALDIand ESI-based data. Proteins with iTRAQ ratio below the lower range were considered to be underexpressed, whereas those above the higher range were considered to be overexpressed. Taking a cutoff point at 30% variation is acceptable because the main variable in this study was technical variation, whereas the biological variation was minimized by sample pooling effect. Another study also reported that at least a 30% variation needed to be taken into consideration for technical variations in large scale protein identification and quantification using the iTRAQ approach (18). Although this method has its limitation in that it uses a general yardstick on every protein, it is only a first pass screening that will be followed up with validation work later. Based on the relative ratios of proteins detected in various cell lines, the proteins were categorized into seven different trend plots shown in column 1 of Table I Duplicate sets of experiments were conducted. MCF10AT cell lines were treated with 1 mM pervanadate for 15 min. The cell lysates were then separately incubated with 4G10 antibodies covalently conjugated to Sepharose beads. Enriched phosphoproteins from each cell line were then separately labeled with iTRAQ tags (114, 115, 116, and 117). The labeled samples were subsequently pooled and digested with trypsin. Labeled peptides were purified using an avidin column and analyzed with ESI-and MALDI-based mass spectrometry as described in detail under "Experimental Procedures." B, histopathology of the tumors derived from MCF10AT cell lines 1k, 1h, and 1a cells showing preneoplastic lesion and low and high grade cancers, respectively. Note that A1 cells are not tumorigenic and therefore not shown here. Xenograft-derived tumors were processed as described under "Experimental Procedures," and hematoxylin and eosin staining was performed. Left panel, AT1k-derived tumor with ductal carcinoma in situ (ii) and microinvasion of stroma (i); middle panel, invasive papillary carcinoma (iii) in CA1h tumor with stromal invasion of malignant epithelial cells (iv); right panel, nests and trabeculae of infiltrating malignant cells of poorly differentiated invasive ductal carcinoma (v). C, pervanadate-induced tyrosine phosphorylation in MCF10AT cell lines. Cells were serum-starved overnight and then untreated or treated with PV at 1 mM for 15 min. Proteins in lysates were then resolved and immunoblotted (IB) with anti-phosphotyrosine antibodies conjugated to horseradish peroxidase (PY20H). The level of ezrin, as detected by anti-ezrin antibodies, was used as a control for equal loading of lysates. Arrows indicate examples of protein bands that displayed differential tyrosine phosphorylation across the four cell lines. O, serum-starved cells not treated with PV. identified via the MALDI and ESI methods, respectively. Of the total 76 proteins detected by both methods, 19 hits (25%) were common. These 19 proteins have both accession (column 5) and IPI numbers (column 4) in the table as they were detected via ESI-and MALDI-based platforms with search algorithm against different databases as mentioned under "Experimental Procedures." Twenty hits were unique to the MALDI-based method (column 4 only), whereas 18 were unique to the ESI-based method (column 5 only). This resulted in a total of 57 unique protein hits detected and relatively quantified. By looking at the trend plot in column 1, six of the 19 common hits (about 30%), as indicated by superscript 1-6 in column 3, exhibited dissimilarity at only one data point (either 116:114 or 117:114). Otherwise they displayed an over- all similar trend of expression during disease progression. The other 13 hits (shaded in Table I for easy reference) were observed to display a similar trend at all data points despite being analyzed by platforms and search algorithms, indicating the robustness of the methods used. In order to facilitate better understanding of the 57 unique proteins detected, they were grouped according to their reported molecular functions using the PANTHER (Protein Analysis through Evolutionary Relationships) Classification System (www.pantherdb.org/). Protein classes such as kinases, phosphatases, and other signaling molecules were identified ( Fig. 2A).
A considerable number of proteins detected were based on single peptide assignment (Table I, last column). This is not surprising because the study was designed to detect tyrosine phosphorylated signaling proteins that are well known to exist in low abundance. In addition, very few tyrosine phosphorylation sites were found, and this is probably due to the fact that enrichment was performed at the phosphoprotein level instead of the phosphopeptide level. Note that due to single peptide assignment most of these proteins detected did not have S.D. or EF unless they had multiple MS 2 identifications. One MS 2 spectrum each for all proteins detected by MALDIand ESI-based platforms is provided in Supplemental List 1. The sequence of spectra was arranged according to their chronological appearance in Table I for easy reference. As representatives, a MALDI-derived MS 2 spectrum of SPAG9 is shown in Fig. 2B.
Validation of Novel Tyrosine Kinase Substrates-The major aims in this study were (i) the identification of novel tyrosine kinase substrates and (ii) identification of novel tyrosine kinase substrates that displayed differential expression during breast cancer progression. Two key factors must be considered. First, in our previous study on EGF-stimulated A431 cells, non tyrosine phosphorylated proteins could be detected in the

Novel Tyrosine Kinase Substrates in Breast Cancer
4G10-purified mixture (13). These were not due to nonspecific enrichment because we have shown that all the proteins enriched by the 4G10 antibodies, regardless of whether they are tyrosine phosphorylated or non-phosphorylated proteins associated with phosphorylated proteins, were specifically purified by the antibodies (14). This conclusion was drawn from the observation that the proteins enriched by 4G10 antibodies when A431 cells were treated with EGF were to- Phosphoproteins were categorized into different molecular functions using the PANTHER Classification System (www.pantherdb.org/) analysis. Although 57 unique proteins were detected, a total of 74 assignments were obtained and sorted into 16 molecular function classifications. This is because some proteins (e.g. EGFR) may be assigned more than one molecular function. B, detection and relative quantification of tyrosine phosphorylation levels of proteins in PV-treated MCF10AT cells. Enriched phosphotyrosyl proteins from A1, 1k, 1h, and 1a cell lines were labeled with iTRAQ tags 114, 115, 116, and 117, respectively. Following MS 2 analysis, iTRAQ reporter and b and y ions of various peptides were detected. As a representative, the MALDI-based MS 2 spectra for the single SPAG9 peptide detected (HIEVQVAQETR, singly charged, precursor m/z ϭ 1453.7902, iTRAQ label tagged at N terminus) is shown. The inset shows an example of the iTRAQ reporter ion sets used for relative quantification of SPAG9 across the four cell lines. SFN, 14-3-3 protein sigma; JUP, junction plakoglobin; ITGA6, integrin alpha-6 precursor (VLA-6); CTNNA1, catenin alpha-1; CTNNB1, splice isoform 1 of beta catenin; DBNL, debrin-like protein; FLNB, splice isoform 2 of filamin-B; PXN, splice isoform gamma of paxilin; TUBB, tublin, beta polypeptide. tally absent when the same EGF-treated cells were first pretreated with Iressa, a selective EGFR inhibitor. Hence it is possible that not all the proteins listed in Table I are authentic tyrosine phosphorylated proteins. Second, the amount of phosphorylated proteins at steady state is maintained by the action of kinases and phosphatases. Pervanadate treatment blocks the action of tyrosine phosphatases thus favoring kinase activity and shifting the equilibrium toward phosphorylated proteins. The differential amount of tyrosine phosphorylated proteins induced by PV treatment of various cell lines could therefore be due to (i) different expression level of the tyrosine kinase substrates or (ii) different amounts of tyrosine kinases and/or phosphatases. To investigate the likely mechanism behind the observed differential protein phosphorylations during breast cancer progression and to determine the authenticity of tyrosine phosphorylation of potentially novel phosphoproteins, immunoprecipitations were performed on 12 well known tyrosine phosphorylated proteins and seven proteins of interest. Four of the seven proteins (SPAG9, CY-FIP1, RPS2, and TOLLIP) are potentially novel tyrosine kinase substrates as revealed by extensive literature search in PubMed using the many aliases listed in Genecardsா, examination of published works (19 -26), and interrogation of other websites e.g. phosphorylation site database (www.phosida-.com/) and Phosphositeா (Cell Signaling Technology). Three of the seven proteins (WBP2, SLC4A7, and NSFL1C) have very recently been identified as potential novel tyrosine kinase substrates by others but were never validated (19,20,25,26). More importantly, they have never been investigated for potential roles in cancer development, which we will describe in subsequent sections.
For consistency, immunoprecipitation was conducted on the same lysates used to generate the phosphoproteomics FIG. 3. Validation of known and potentially novel tyrosine phosphorylated protein identified in MCF10AT cells. Cells were serumstarved overnight and then untreated or treated with PV at 1 mM for 15 min. Immunoprecipitations (IP) using the indicated protein-specific antibodies were performed in duplicates on the lysates. The immunoprecipitates were immunoblotted (IB) with anti-phosphotyrosine antibodies conjugated to horseradish peroxidase (PY20H) and protein-specific antibodies to reveal the relative tyrosine phosphorylation and expression levels, respectively, across the four MCF10AT cell lines. A, validation of known tyrosine phosphorylated proteins. B, validation of proteins that have never been reported or validated to be tyrosine phosphorylated. For SPAG9, CYFIP1, RPS2, TOLLIP, and SLC4A7 proteins, analyses were performed as described above. In the case of WBP2 and NSFL1C where commercially antibodies were not available, their cDNAs were cloned into pcDNA3.1 Directional TOPO vector that allowed expressed proteins to be tagged with a V5 sequence at the C terminus. NSFL1C or WBP2 cDNA was then transfected into HEK293T cells. After 72 h, HEK293T cells were untreated or treated with 1 mM PV for 15 min and lysed. Immunoprecipitation of NSFL1C and WBP2 was carried out in duplicate sets using anti-V5 antibodies, and each set was immunoblotted with either PY20H or anti-V5 antibodies. O, serum-starved cells not treated with PV; UT, untransfected cells. C, measurement of NSFL1C and WBP2 transcripts in MCF10AT cells using real time PCR. Following extraction of total RNA, real time PCR was conducted using gene-specific probes for NSFL1C and WBP2 as described in detail under "Experimental Procedures." The relative quantity minimum/maximum range (bar) was calculated based on built-in software in the ABI 7900HT machine. It provides a range where the "true" gene expression value lies based on the 95% confidence interval. We classified a gene as significantly up-regulated when the minimum value in one condition was higher than the maximum value of the same gene in another condition. The reverse was true. data in Table I. Two sets of immunoprecipitates were prepared. One set of immunoprecipitate was probed with antiphosphotyrosine PY20 antibodies, and the other set was probed with protein-specific antibodies. The dataset for the 12 well known tyrosine phosphorylated proteins is shown in Fig. 3A. All 12 proteins, namely FAK, Paxillin, SHP2, p130CAS, plakoglobin, EPHA2, SHC, ␤-catenin, EGFR, EPS8, E-cadherin, and Cortactin, were confirmed to be tyrosine phosphorylated. There are no commercially available antibodies for two of the seven potentially novel phosphoproteins. They are WBP2 and NSFL1C. Consequently we subcloned the WBP2 and NSFL1C cDNA into pcDNA3.1 Directional TOPO vector that allowed expressed proteins to be tagged with a V5 sequence at the C terminus. NSFL1C or WBP2 cDNA was then transfected into HEK293T cells. After 72 h, HEK293T cells were untreated or treated with 1 mM PV for 15 min and lysed. Immunoprecipitation of NSFL1C and WBP2 was carried out in duplicate sets using anti-V5 antibodies, and each set was immunoblotted with either PY20H or anti-V5 antibodies. The immunoprecipitation results of WBP2, NSFL1C, and five other potentially novel tyrosine kinase substrates are presented in Fig. 3B. In summary, all seven proteins were validated to be authentic tyrosine phosphorylated proteins. No NSFL1C or WBP2 protein was immunoprecipitated by anti-V5 antibodies in the untransfected cells, and very little or no phosphotyrosine signal was observed in these untreated or treated cells. In contrast, NSFL1C and WBP2 proteins in transfected HEK293T cells were enriched by anti-V5 antibodies and were clearly tyrosine phosphorylated in PV-stimulated but not in untreated HEK293T cells. From Fig. 3, A and B, we observed that the tyrosine phosphorylation levels of EGFR, EPS8, SHC (p66), and EPHA2 correlated with protein expression levels across all four cell lines. In contrast, there was no or limited correlation between tyrosine phosphorylation and expression trends for the rest of the proteins, indicating that the differential tyrosine phosphorylations observed for the rest of the proteins were likely due to aberrant tyrosine kinase or phosphatase activity rather than their expression levels. This is conceivable because EGFR, FAK, and EPHA2 tyrosine kinases were found to be aberrantly expressed in the MCF10AT model (Fig. 3A). Fig. 3A, immunoblotting of the immunoprecipitates of the 12 well known phosphorylated signaling proteins with protein-specific antibodies revealed several proteins that displayed differential expression during breast cancer progression. They include FAK, p130CAS, EPHA2, EGFR, EPS8, and E-cadherin. Epithelialmesenchymal transition is a cell transformation process that is widely accepted to be part of cancer development. This process is marked by loss of E-cadherin, an epithelial cell marker, and increased expression of Vimentin, a marker for cells of mesenchymal origin. Indeed we were able to observe diminished E-cadherin expression (Fig. 3A) and increased expression of Vimentin in the MCF10AT model (Supplemental Fig. 2). In addition, we detected increasing Vimentin tyrosine phosphorylation from A1 through 1a cells (Supplemental Fig.  2A), and this supports the detection of enhanced tyrosine phosphorylated Vimentin in breast tumors compared with normal tissues that we made a couple of years ago (15). FAK tyrosine kinase and EPHA2 and EGFR receptor tyrosine kinases are popularly implicated in various cancers (27). Other proteins such as p130CAS have been found to be aberrantly expressed in human or metastatic cancers (28). Collectively these observations reflect the robustness of the model and analytical systems used in detecting molecular changes during cancer development.

Novel Tyrosine Kinase Substrates and Their Expression during Breast Cancer Progression-From
Among the novel tyrosine kinase substrates for which we have antibodies, only SLC4A7 (decreased) and TOLLIP (increased) displayed differential expression during breast cancer progression (Fig. 3B). As we did not have antibodies against WBP2 and NSFL1C, we resorted to real time PCR. Fig. 3C shows the mRNA levels of NSFL1C and WBP2 across the four cell lines in the breast cancer progression model. These were done in four replicates, and the bars represent the minimum and maximum values, i.e. the true mRNA level could be as low or as high as these values. For our classification, a gene was considered overexpressed if its minimum value in one condition was higher than the maximum value of the same gene in another condition. The reverse was true. Based on this system, there was an increase and decrease of NSFL1C and WBP2 mRNA, respectively, when normal (A1) and high grade cancer (1a) cells were compared. Whether the gene products or proteins display a similar trend would require further validation using antibodies, which are not available commercially at this moment. Interestingly NSFLC1 is a cofactor of VCP, which was also found to be differentially tyrosine phosphorylated in our study (Table I). Table II shows the expression data of the seven novel tyrosine kinase substrates obtained in this study as well as the published information on the subcellular localization, function, and potential involvement in cancer development of the seven novel phosphoproteins to provide an overview of the candidate proteins with respect to cancer/tumor biology. Despite the fact that the MCF10AT model has been used by various groups, a major limitation of such in vitro/animal model is that frequently these systems lack the physiological context present in the human body. Consequently we selected SLC4A7 and TOLLIP for further studies in clinical human samples because they represented novel tyrosine kinase substrates that were also observed to be differentially expressed during breast cancer progression. Although SPAG9 was not found to be significantly differentially expressed in the MCF10AT model, it also was chosen for a closer examination because of a study that reported amplification of the 17q12-23 region (where SPAG9 resides) in breast cancer (29). NSFL1C has been validated to be a novel phosphotyrosyl protein and shown by real time PCR to be overexpressed in breast cancer cells. However, it could not be evaluated further because of the lack of antibodies. A preliminary immunohistochemistry study on SPAG9, TOLLIP, and SLC4A7 was conducted on five matched normal and breast tumors samples of Asian origin. Consistent with the results obtained from the MCF10AT model, differential expressions of TOLLIP and SLC4A7 but not SPAG9 were apparent between normal and tumor cells (only data for SPAG9 are shown in Supplemental  Fig. 3). Subsequently we conducted IHC of SLC4A7 and TOLLIP on an additional 20 matched clinical samples. In addition to SLC4A7 and TOLLIP, IHC on two other molecules highly relevant to clinical breast cancer, namely ER and ErbB2, was also performed for comparison. The complete raw IHC results are provided as Supplemental Table 4 and summarized in Table III. In 64% (14 of 22) of all cases, there was more SLC4A7 expression in normal compared with tumor sections. Note that three matched cases could not be analyzed because of the lack of ductal components in the normal section. On the other hand, 14% displayed more SLC4A7 in tumor compared with normal tissues, whereas 23% did not show differential expression between normal and diseased samples. This demonstrated that loss of SLC4A7 expression occurred in the majority of breast cancer cases studied. Representative data on the IHC of SLC4A7 showing more expression in normal ducts compared with tumor section are shown in Fig. 4, top panels. The differential expression of TOLLIP among normal and tumor sections was more heterogeneous with 25 and 35% showing elevated and diminished expression, respectively, in tumor compared with normal tissues. No differential expression was observed in the rest of the cases. Representative data showing more expression of TOLLIP in tumor compared with normal breast and vice versa are shown in Fig. 4, middle and bottom panels, respectively. From the IHC data, SLC4A7 was clearly localized in the cell membrane (especially in the tumor histological section), whereas TOLLIP was predominantly cytoplasmic. Both observations are consistent with the literature (Table II). ErbB2 was overexpressed in close to 60% (13 of 22) of the cases when matched normal and tumor samples were analyzed (Table III). In invasive carcinomas alone, ErbB2 expression could be detected (a score of 1ϩ and above) in 82.6% (19 of 23) of the cases (Supplemental Table 4). If we define a score of 2ϩ or more as overexpression, then ErbB2 was overexpressed in 30% (7 of 23) of all tumor cases. This is consistent with the 20 -25% reported originally by Slamon et al. (30). Similar to existing reports, 53% of all cases analyzed here were positive (as defined by 1ϩ and above) for ER expression (31) (Table III). These data support the robustness of the data obtained.

SLC4A7 and TOLLIP Expression in Clinical Breast Samples-
SPAG9, NSFL1C, WBP2, and TOLLIP Are Validated as Novel Tyrosine Phosphorylation Targets of EGF Signaling-One striking observation that was consistent between the data in Table I and Fig. 3 was the marked deprivation of almost all tyrosine phosphorylation proteins examined (except FAK and EPHA2) in 1a cells. Coincidentally EGFR expression exhibited a trend that bore similarity to the tyrosine phosphorylation pattern of these proteins. In other words, loss of EGFR in late stage 1a cancer cells correlated with the diminished tyrosine phosphorylation levels of a considerable number of proteins detected here. Note that the expression of EGFR has since been further examined in close to 100 clinical breast samples. In 93% of the cases, there was more EGFR in normal epithelial cells than breast tumor cells suggesting that loss of EGFR during breast cancer progression is clinically relevant (48). Hence we postulated that EGFR might be the enzyme for the novel kinase substrates identified. To test this hypothesis, the tyrosine phosphorylation levels of these novel substrates were examined when cells were untreated or stimulated with EGF. To increase the specificity of the data, cells were either untreated or pretreated with Iressa, a highly selective EGFR inhibitor, at 10 M for 1 h prior to stimulation with EGF. Immunoprecipitations of SPAG9, CYFIP1, RPS2,  TOLLIP, and SLC4A7 were conducted in A431 cells untreated or treated with EGF at 50 ng/ml for 30 min. Immunoprecipitations were also performed for NSFL1C and WBP2 from EGFR-cotransfected HEK293T cells untreated or treated with EGF at 50 ng/ml for 2 min. The EGF time point selected for both cells had been optimized previously (data not shown).
Immunoprecipitates were separately probed with anti-phosphotyrosine and protein-specific antibodies to reveal the tyrosine phosphorylation and expression levels of the proteins, respectively. Fig. 5A, top panel, shows the induction and abrogation of overall tyrosine phosphorylation in EGF-treated A431 cells. Immunoprecipitates of SPAG9 and TOLLIP showed a weak basal phosphotyrosine signal in unstimulated cells that increased upon EGF stimulation (Fig. 5A, bottom  panels). Abrogation of the EGF-induced increase in tyrosine phosphorylation of these proteins by Iressa implied that these are authentic and specific targets of EGF signaling. Similarly induction of NSFL1C and WBP2 tyrosine phosphorylation by EGF in transfected HEK293T cells was also specifically inhibited by Iressa pretreatment (Fig. 5B). Note that the tyrosine phosphorylation levels of TOLLIP and NSFL1C were not completely abolished by Iressa suggesting that both proteins were constitutively phosphorylated probably by another tyrosine kinase in unstimulated cells. Hence SPAG9, TOLLIP, NSFL1C, and WBP2 were all authentic tyrosine phosphorylation targets of EGF signaling. On the other hand, tyrosine phosphorylation of CYFIP1, SLC4A7, and RPS2 was not induced by EGF treatment suggesting that they are probably downstream substrates of other growth factor signaling (data not shown). Table I only shows a list of proteins that were induced to undergo tyrosine phosphorylation by pervanadate treatment. It lacks the biochemical context. To create significance out of otherwise static phosphoproteomics data, we constructed a biological interaction network (BIN) of the proteins identified in the PV-induced phosphotyrosine proteome (Fig. 6). This was generated with PathwayAssist from Ariadne Genomics. Databases that were used include the Kyoto Encyclopedia of Genes and Genomes pathway database (32)(33)(34), Biomolecular Interaction Network Database (35), and ResnetCore from Ariadne Genomics. We also incorporated information from the breast cancer category of the PathArt database which is proprietary to Jubilant Biosys. The proteins that could be networked were linked by various relationships such as protein interactions, modifications including phosphorylation, and regulation of expression. These relationships are colorcoded, and the legends are provided next to the map. However, this does not mean that all the interactions took place within a single spatial and temporal situation, but it suggested strongly that the majority of the proteins identified in this study were integral parts of protein complexes. Two proteins with the most numbers of connectivity were EGFR and FAK1/PTK2, suggesting that these proteins represent central nodes in the mapped phosphoproteome.
Several proteins remained orphans because there is insufficient information in the database to link them to other proteins in the BIN. With the validation of SPAG9, NSFL1C, WBP2, and TOLLIP as authentic tyrosine kinase substrates FIG. 5. Identifying novel tyrosine phosphorylation targets of EGF signaling. A431 cells were serum-starved overnight and then untreated or treated with EGF at 50 ng/ml for 30 min. Where indicated, cells were also pretreated or not pretreated with Iressa at 10 M for 1 h. A, top panel, whole cell lysates were probed with antiphosphotyrosine antibodies to show the induction and abrogation of tyrosine phosphorylation signals in EGF-treated cells. Bottom panels, immunoprecipitations (IP) using the indicated protein-specific antibodies were performed in duplicates on the lysates. The immunoprecipitates were subsequently immunoblotted (IB) with anti-phosphotyrosine antibodies conjugated to horseradish peroxidase (PY20H) and protein-specific antibodies to reveal the relative tyrosine phosphorylation and expression levels, respectively. B, WBP2 and NSFL1C cDNAs were cloned into pcDNA3.1 Directional TOPO vector that allowed expressed proteins to be tagged with a V5 sequence at the C terminus. NSFL1C or WBP2 cDNA was then co-transfected with EGFR into HEK293T cells. After 72 h, HEK293T cells were untreated or treated with 50 ng/ml for 2 min and lysed. Immunoprecipitations of NSFL1C and WBP2 were carried out in duplicate sets using anti-V5 antibodies. The immunoprecipitates were subsequently immunoblotted with anti-phosphotyrosine antibodies conjugated to horseradish peroxidase (PY20H) and V5 antibodies to reveal the relative tyrosine phosphorylation and expression levels, respectively. The double arrows in the bottom panel point to the native and possibly phosphorylated form of WBP2. The single arrow points to the position of tyrosine-phosphorylated NSFL1C.
in EGF signaling, these proteins could now be formally linked to the EGFR node in the BIN. 2

DISCUSSION
In this study, we used phosphoproteomics and the MCF10AT model as the biological and analytical systems, respectively, to identify expression changes in novel tyrosine kinase substrates during breast cancer progression. The xenograft-derived MCF10AT model comprises isogenic derivatives of the MCF10A mammary epithelial cell line that represent different stages of breast cancer progression (5,36). The MCF10AT cells were derived from MCF10A normal mammary epithelial cells transfected with T24 constitutively active RAS mutant. Although RAS mutations are rare in breast cancer, it is commonly activated in breast cancer by overexpressed growth factor receptors, which signal through RAS (37). Hence this model is likely to reflect a subset of human breast cancers and their progression. Indeed many observations obtained in this study supported this notion. They include the differential expression of known phosphotyrosyl proteins such as E-cadherin and Vimentin during epithelial-mesenchymal transition and of FAK, EPHA2, and p130CAS during cancer development. In addition, the loss of EGFR during breast cancer progression observed in this dataset has subsequently been validated in a separate study where close to 100 clinical breast samples were analyzed. About 93% of all cases expressed much less EGFR in tumor compared with normal tissues (48).
In a considerable number of proteins detected in this study, the relative levels of tyrosine phosphorylation as revealed by the protein-specific immunoprecipitation method in Fig. 3, A   2 The BIN created in this study will be available on the Internet for public access at the time of publication of this study. The URL can be obtained by contacting the corresponding author. The web page contains hyperlinks through which the information on the proteins and the nature of their interactions with each other are provided. Where available, the references including a description of the experimental findings on which the BIN was based are also included. Such protein networking information should be useful for formulating testable hypotheses to understand the function of novel tyrosine kinase substrates. Table I were imported into PathwayAssist, and an interaction map was generated with information from ResnetCore of Ariadne Genomics, Kyoto Encyclopedia of Genes and Genomes pathway database, Biomolecular Interaction Network Database, and PathArt breast cancer category from Jubilant Biosys. Each node represents either a protein entity or a control mechanism of the interaction. The legend of the interaction network is summarized on the right of the figure. SPAG9, TOLLIP, NSFL1C, and WBP2, which were identified as novel tyrosine phosphorylation substrates of EGF signaling, are highlighted on the map. The biological network can be accessed on the Internet (see Footnote 2). SFN, 14-3-3 protein sigma; FLNB, splice isoform 2 of filamin B; DBNL, drebin-like protein; TUBB, tubulin, beta polypeptide; CTTN, catenin; PXN, splice isoform gamma of paxilin; JUP, junction plakoglobin. and B, were dissimilar (except for EGFR, p130CAS, and p66 isoform of SHC) from that detected by the phospho-iTRAQ method in Table I. Note that datasets from both methods were derived from at least two experiments, and consistent results were obtained within each method. Such discrepancies were also observed in other phosphoproteomics studies. For example, in the study by Blagoev et al. (19), the degree of tyrosine phosphorylation of breast cancer anti-estrogen resistance and ARP3 at time points 0 and 20 min was not consistent between the phosphoproteomics and the immunochemistry methods. Such irregularity could be attributed to the fact that phosphoproteomics approaches using generic anti-phosphotyrosine antibodies for purification at the protein level are influenced by protein-protein interactions. In contrast, immunoprecipitation using protein-specific antibodies in principle enriches the entire population of the protein of interest and presents them for a more accurate determination of phosphorylation level following immunoblotting with antiphosphotyrosine antibodies. With respect to this, phosphoproteomics involving purification at the peptide level as conducted by other groups (20,22,26) might circumvent the above mentioned limitation and offer a more accurate quantification of the phosphorylation levels of proteins. However, the potential difficulty of ionizing negatively charged phosphopeptides as is commonly encountered may present another challenge to accurate quantification.

FIG. 6. Biological interaction networking of tyrosine phosphorylated proteins identified in pervanadate-induced phosphotyrosine proteome in MCF10AT cells. Proteins identified in
Although several groups have conducted phosphoproteomics using various strategies, and some potentially novel phosphoproteins have been identified, very few have been validated (19 -26). In contrast, this study identified and validated seven proteins, SPAG9, CYFIP1, RPS2, TOLLIP, SLC4A7, WBP2, and NSFLC1, to be authentic tyrosine kinase substrates. In addition, SPAG9, WBP2, TOLLIP, and NSFL1C were demonstrated to be authentic tyrosine phosphorylation targets of EGF signaling. Coupled to the three proteins that we discovered in our previous study (13), namely Endofin, DCBLD2, and CEP68, a total of seven novel phosphotyrosyl targets of EGF signaling have been discovered by our group. Further studies on these proteins will eventually lead to new insights on EGF signal transduction. Four of the seven novel tyrosine kinases, NSFL1C, WBP2, TOLLIP, and SLC4A7, displayed differential expression in the MCF10AT model at the protein (TOLLIP and SLC4A7) or mRNA (WBP2 and NSFL1C) level. Differential expression of TOLLIP and SLC4A7 was subsequently validated in clinical breast cancer samples. Although the sample size (n ϭ 25) tested was not large, it suffices in validating the in vitro observations made in this study. Consistent with the MCF10AT model, more than 2 ⁄3 of breast cancer cases analyzed displayed a diminished level of SLC4A7 compared with normal tissues. In contrast, only 25% of the cases revealed increased expression of TOLLIP when normal cells become cancerous. Although this supports the observation made in the MCF10AT model, the role of TOLLIP in breast cancer development is molecularly heterogeneous because 35% of the cases showed the reverse trend. The reason for the divergence of phenotypic expression is not clear, although it is likely to be influenced by other molecular determinants, e.g. p53, leading to polyclonal expansion and heterogeneity in cancer cell populations. Further investigations with a larger sample size and correlation with clinical information will be conducted in a separate study to determine the potentially novel clinical utility of these candidate proteins for diagnosis and/or prognosis.
Little information is known about the TOLLIP protein except that it is a negative regulator of NF-B and Toll signaling pathway during inflammation and innate immune response (38,39). It has also been implicated in the trafficking of ubiquitinated proteins and IL-1 receptor (40,41). The association of chronic inflammation and cancer has long been observed. Moreover inflammatory cytokines and chemokines such as tumor necrosis factor have been implicated in cancer progression and chemoresistance (42). Hence it is conceivable that TOLLIP might regulate cancer cell development. The role of SLC4A7 in cancer biology is less obvious. It is a transporter for bicarbonate (HCO 3 Ϫ ), which is a by-product of energy production. HCO 3 Ϫ /CO 2 forms the major pH buffering system of our bodies. SLC4A7 probably facilitates HCO 3 Ϫ influx and efflux leading to cellular alkalinization and acidification, respectively. Interestingly magnetic resonance has shown that although the intracellular compartment of tumor is kept at neutral pH the extracellular compartment is almost always more acidic by about 0.2 pH units (43). The scientific reason for the more acidic microenvironment in tumor is not known although excessive production of lactic acid by tumors during hypoxia and the greater distance required for the H ϩ to diffuse to capillaries in tumor compared with normal tissue leading to increased acidity have been suggested. It has also been hypothesized that hypoxia and acidity may contribute to cancer progression by providing a selection pressure for active selection of cancer phenotype, chemoresistance, and metastatic behavior (44). In fact, one of the popular alternative therapies for cancer but also one of the least understood concepts of nutrition is alkaline therapy through diet. Our observation on the loss of SLC4A7 bicarbonate transporter in the majority of the breast cancer cases generates a testable hypothesis that loss of SLC4A7 by malignant cells may contribute to acidity in the microenvironment that confers growth and development advantage to the cancer cells but that is unsuitable for normal or more differentiated cells. This is the first time SLC4A7 and TOLLIP have been validated as novel tyrosine kinase substrates that are also implicated in human cancer development. Discovery of these phosphoproteins might not have been possible via a conventional proteomics approach. Our study therefore demonstrated the power of phosphoproteomics and model systems used here in identifying novel tyrosine phosphorylated proteins as potential agents in cancer initiation and progression. Further molecular and functional studies will certainly provide novel insights into the roles of TOLLIP and SLC4A7 in the molecular etiology of cancer. Moreover detection of aberrant expression of TOLLIP and SLC4A7 in preneoplastic lesions suggests that they represent potential biomarkers that could complement mammography and histopathology, which are thwarted with uncertainties, for screening and early detection of breast cancer.