A Systems-level Characterization of the Differentiation of Human Embryonic Stem Cells into Mesenchymal Stem Cells*

In this integrative, multi-omics study we characterized the differentiation of human embryonic stem cells into mesenchymal stem cells using transcriptomics, quantitative MS-based proteomics and phosphoproteomics. Based on RNA-to-protein correlation, we determined a set of high confidence genes that are important to differentiation with AHNAK hypothesized to be a defining factor in MSC biology. Additional central findings include two distinct expression waves of developmental HOX genes as well as an AGO2-to-AGO3 switch in gene silencing. Graphical Abstract Highlights Integrative multi-omics study characterizing the differentiation from hESCs into hMSCs. Set of high confidence genes important in hESC to hMSC differentiation defined. Two distinct expression waves of HOX genes and a AGO2-to-AGO3 switch in gene silencing identified. AHNAK hypothesized as a defining factor in MSC biology. Mesenchymal stem/stromal cells (MSCs) are self-renewing multipotent cells with regenerative, secretory and immunomodulatory capabilities that are beneficial for the treatment of various diseases. To avoid the issues that come with using tissue-derived MSCs in therapy, MSCs may be generated by the differentiation of human embryonic stems cells (hESCs) in culture. However, the changes that occur during the differentiation process have not been comprehensively characterized. Here, we combined transcriptome, proteome and phosphoproteome profiling to perform an in-depth, multi-omics study of the hESCs-to-MSCs differentiation process. Based on RNA-to-protein correlation, we determined a set of high confidence genes that are important to differentiation. Among the earliest and strongest induced proteins with extensive differential phosphorylation was AHNAK, which we hypothesized to be a defining factor in MSC biology. We observed two distinct expression waves of developmental HOX genes and an AGO2-to-AGO3 switch in gene silencing. Exploring the kinetic of noncoding ORFs during differentiation, we mapped new functions to well annotated long noncoding RNAs (CARMN, MALAT, NEAT1, LINC00152) as well as new candidates which we identified to be important to the differentiation process. Phosphoproteome analysis revealed ESC and MSC-specific phosphorylation motifs with PAK2 and RAF1 as top predicted upstream kinases in MSCs. Our data represent a rich systems-level resource on ESC-to-MSC differentiation that will be useful for the study of stem cell biology.


In Brief
In this integrative, multi-omics study we characterized the differentiation of human embryonic stem cells into mesenchymal stem cells using transcriptomics, quantitative MS-based proteomics and phosphoproteomics. Based on RNA-to-protein correlation, we determined a set of high confidence genes that are important to differentiation with AHNAK hypothesized to be a defining factor in MSC biology. Additional central findings include two distinct expression waves of developmental HOX genes as well as an AGO2-to-AGO3 switch in gene silencing.

Highlights
• Integrative multi-omics study characterizing the differentiation from hESCs into hMSCs.
• Set of high confidence genes important in hESC to hMSC differentiation defined.
• Two distinct expression waves of HOX genes and a AGO2-to-AGO3 switch in gene silencing identified.
• AHNAK hypothesized as a defining factor in MSC biology.
Mesenchymal stem/stromal cells (MSCs) 1 are self-renewing multipotent cells with regenerative, secretory and immuno-modulatory capabilities. They thus have great potential in cell-based therapy and have proven beneficial for the treatment of various diseases (1). MSCs may be isolated from multiple adult tissues with bone marrow as the most common source. Use of adult tissue-derived MSCs in therapy, however, is burdened with complications including inhomogeneity and senescence (2). Highly proliferative embryonic stem cellderived MSCs (ESC-MSCs) have been proposed as an alternative source with benefits to availability, biosafety and standardized therapy (3). MSCs may be generated from ESCs or induced pluripotent stem cells (iPSCs) by protocols ranging from co-culture with OP9 cells, selection of mesenchymal-like cells after undirected differentiation, to directed differentiation using small molecular compounds (4 -7). Here we improve our earlier published protocol, providing robust and reproducible differentiation (8). At the core of the improved protocol is the efficient initiation of ESC differentiation by bone marrow-derived MSC (BM-MSC)-conditioned medium. Despite the strong interest in MSCs over the last decade, their developmental origin remains unclear. Mesoderm commitment and differentiation is considered the major source of MSCs, and of the three main mesodermal branches, hematoendothelial (CD34ϩ/KDRϩ/PDGFR␣ dim/neg; blood, endothelium), cardiovascular (CD34neg/KDRϩ/PDGFR␣ϩ; endothelium, cardiomyocytes, smooth muscle) and mesenchymal (CD34neg/ KDR dim/neg/PDGFR␣ϩ/CD73ϩ; fibroblasts, bone, cartilage, fat) (9), the latter is thought to give rise to MSCs.
Several large-scale studies including transcriptome, histone and/or DNA methylation have explored mesoderm commitment and differentiation (10 -12). Advances in quantitative mass spectrometry based proteomics and phosphoproteomics have also led to investigation of system-wide ESC pluri-potency and differentiation (13)(14)(15)(16)(17), lineage commitment, as represented by the differentiation of ESCs to MSCs, has, however, proteomicly been studied only for the neuronal lineage (16).
Using an integrative multi-omics approach combining transcriptome, proteome and phosphoproteome profiling of MSC differentiation from hESCs we here present a comprehensive view of MSC development yielding insights into molecular mechanisms, reveal players in MSC biology and include a phosphoproteomic angle, which lays a foundation to decipher the evolution of molecular events during differentiation. We extracted affected biological functions, signaling pathways, and differentiation potential in developing MSCs and identified important transcription factors, kinases and phosphatases, as well as noncoding transcripts associated with the process.

EXPERIMENTAL PROCEDURES
Cell Culture and Differentiation-Permission to use the human ESC line (hESC) ES04 (WiCell institute, Madison, WI) was obtained from the Cornell/Rockefeller/Sloan Kettering tri-institutional ESC research oversight committee. Funding was secured from nonfederal, USexternal funding sources. ES04 were expanded feeder-free in mTeSR1 medium (Stem Cell Technologies, Vancouver, Canada) supplemented with penicillin/streptomycin on growth factor-reduced matrigel (Stem Cell Technologies). Cells were passaged 1:6 at 80% confluence using 1 mg/ml dispase. Medium was changed daily. Differentiation into MSCs was initiated at 10% confluence using conditioned MesenCult medium (Stem Cell Technologies) mixed 1:1 with unconditioned MesenCult medium. On day 6 of differentiation cells were passaged to noncoated flasks 1:3 using accutase, with an additional passage on day 12. All following passages were performed with 0.05% EDTA-trypsin. Differentiation medium was replaced every 3 days until day 18. Starting from differentiation day 19 normal Mes-enCult medium was used. Fully differentiated ESC-derived MSCs were obtained on day 30. Based on microscopic observation and FACS data of MSC markers the purity of ESC-derived MSCs from day 30 was around 90 -95% and comparable to BM-MSCs. Bone marrow-derived MSCs (BM-MSCs) were expanded in MesenCult medium supplemented with penicillin/streptomycin. BM-MSCs were passaged 1:3 at 80% confluence using 0.05% trypsin with EDTA-trypsin. Medium was collected after 3 days as conditioned medium for ESC to MSC differentiation. Conditioned medium was cleared from cellular debris by centrifugation (300 ϫ g, 10 min). BM-MSCs were purchased (Lonza, Basel, Switzerland, StemCell technologies) with the following donor details: 40y/m (MSC-001F, lot#BM2893), 39/m (PT2505, lot#1F3422), 27y/m (PT2505, lot#318006). To study the differentiation samples were harvested at 8h, after 1d, 2d, 5d, 15d and 30d. Undifferentiated ESCs and ESC-MSCs were harvested with dispase until day 2, with accutase until day 15 and at day 30 with trypsin according to the sampling schedule (Fig. 1). One limitation of the study is the use of a single hESC line for the ESC to MSC differentiation experiments.
Experimental Design and Statistical Rationale-All experiments were performed independently in biological triplicates. Differentiation was replicated using ESCs from different passages. For each differ-entiation experiment conditioned medium was obtained from independent BM-MSC donors (purchased from Lonza, Stem Cell technologies), also included per experiment as end point controls. Samples for transcriptome, proteome and phosphoproteome analysis were obtained according to the harvesting schedule in Fig. 1. Differential expression analyses for all three omics approaches were performed with the limma R package using the empirical Bayes moderated t test after ensuring normal distribution. For each time point all biological triplicates (n ϭ 3) were included in the statistical analysis. Features with a FDR-corrected (Benjamini-Hochberg) p value Ͻ 0.05 were regarded as differentially expressed. Features with less than 50% completeness were excluded from the analysis.
Flow Cytometry-Cells were labeled with fluorescence-conjugated antibodies against CD73, CD105, and CD45 (BD Biosciences, San Jose, CA). Matching isotype controls were used as staining controls. Samples were measured on FACSAria II and Fortessa (BD Biosciences) with 10,000 events per acquisition. Data was analyzed with FACSDiva v6.3 (BD Biosciences).

Transcriptome: Next Generation RNA Sequencing
Sample Preparation for RNA-seq-Following RNA isolation with TRIZOL, 10 ng of total RNA was used to generate RNA-seq libraries using the Ovation Single Cell RNA-Seq System (Nugen Technologies, Inc., San Carlos, CA) following the manufacturer's protocol. The final libraries were quantified using the Agilent Bioanalyzer High Sensitivity DNA (Agilent, Santa Clara, CA) assay then pooled using 8 libraries per pool. Paired-end 100 bp sequencing was carried out on the Illumina HiSeq 2500 (Illumina, Inc., San Diego, CA).
Data Analysis for RNA-seq-Quality check was performed for the raw reads with FastQC (version 0.10.1). The 100 bp paired reads were mapped to the human reference genome from GENCODE built GRCh38 patch release 25 with STAR (v 2.5.1a) using Ensembl 87 gene annotation. Aligned reads were quantified with featureCounts function from Rsubread (v 1.24.2), Bioconductor Package in R (v 3.3.3). All the read counts from conditions were combined into a data matrix based on gene identifiers. Differential expression was based on RNA count data after voom (20) and inverse normalization using empirical Bayes moderated t test (21) within the autonomics package (manuscript in preparation; https://github.com/bhagwataditya/ autonomics).

Proteome and Phosphoproteome
Protein Preparation, Reduction, Alkylation, Digestion, Labeling-Cells were lysed in 2% SDS buffer containing 30 mM Tris, pH 8.5, protease inhibitors (Roche, Basel, Switzerland), phosphatase inhibitors (Roche), and benzonase (Sigma, 2 l per 1 ml). After methanol/ chloroform precipitation samples were resuspended in urea/thiourea buffer (6 M/2 M, 30 mM HEPES, pH 8). Protein concentration was determined by Bradford. After reduction (1 mM DTT, 30 min) and alkylation (5 mM IAA, 20 min, in the dark), proteins were digested at a protein to enzyme ratio of 100:1 first by endopeptidase Lys-C (Wako Chemicals, Richmond, VA) for 3 h and then by trypsin overnight after a 1:4 dilution with 10 mM tetraethylammonium bromide, pH 8. Peptides (200 g) were labeled by reductive dimethylation. Time points were labeled as "medium" or "heavy," a combined standard representing a mixture of all samples as "light." Dimethyl labeled samples were combined as triplex and desalted on R3 columns. Labeled eluted peptides were split into equal aliquots for proteome and phosphoproteome analysis with 300 g total peptides each.
Peptide Fractionation by In-solution Isoelectric Focusing-Peptides for proteome analysis were separated by in-solution isoelectric focusing (OFFGEL fractionator, Agilent) into 12 fractions within the pH range 3-10 according to manufacturer's instructions. The ampholyte and glycerol concentration were reduced to 0.1% and 0.3%, respectively (23). Peptides were harvested after 20 kVh focusing with an additional well wash of 50:49:1 methanol/dH2O/TFA. Peptides were desalted on reversed phase (RP) C 18 STAGE Tips.
Phosphopeptide Enrichment-Phosphopeptides were enriched according to the TiSH protocol (24), a combination of TiO 2 and sequential elution from IMAC enrichment combined with HILIC fractionation. Briefly, 300 g dimethyl-labeled peptides were equilibrated in 1 ml loading buffer (80% acetonitrile, 5% TFA, 1 M glycolic acid) and phosphopeptides were enriched twice with 1.8 and 0.9 mg TiO 2beads. Phosphopeptides were eluted from the beads with ammonia solution (pH 11.3) and cleaned on R3 column. Phosphopeptides were evaporated to dryness, reconstituted in 50% acetonitrile, 2% TFA (pH Յ 0.83) and added to washed IMAC beads. After 30 min low speed vortexing at RT unbound phosphopeptides were combined with eluates at 50 and 20% acetonitrile in 2% TFA. These monophosphorylated peptides were subjected to a second TiO 2 enrichment using 1.8 mg and 0.9 mg beads. Multi-phosphorylated peptides were eluted from the IMAC beads with ammonia solution (pH 11.3). Monoand multiphosphorylated peptides were cleaned on R3 columns before nanoLC-MS/MS analysis. Mono-phosphorylated peptides were further fractionated by HILIC (see below).
Hydrophilic Interaction Liquid Chromatography (HILIC) Fractionation-Mono-phosphorylated peptides were reconstituted in 0.45 l 10% TFA and 3.5 l H 2 O and 40 l 100% acetonitrile were added. Samples were loaded onto an in-house packed TSKgel Amide 80 HILIC 320 m ϫ 170 mm capillary HPLC column using an Agilent 1260 HPLC system. Mono-phosphorylated peptides were eluted over a 48 min gradient starting from 90% acetonitrile, 0.1% TFA to 0% acetonitrile, 0.1% TFA. Fractions were collected automatically in oneminute intervals in a 96-well plate. According to the UV detection fractions were combined into 16 final fractions. Before nanoLC-MS/MS analysis, samples were dried by vacuum centrifugation and reconstituted in 0.5 l of 100% formic acid followed by 4.5 l H 2 O.

Mass Spectrometry
Peptide samples were analyzed by liquid chromatography (LC) using an EASY nLC-II coupled to a Q Exactive mass spectrometer (Thermo Scientific, Bremen, Germany) as previously described (25). Briefly, peptides were separated on 120 min gradients and mass spectra were acquired in data dependent mode (Top10), using higher energy collisional dissociation (HCD) (normalized collision energy 25) for fragmentation. Precursor scans (MS1 level) were acquired at a resolution of 70,000 (m/z 300) and an AGC target value of 3,000,000 charges (maximum ion injection time 20 ms). Fragmentation spectra were acquired at a resolution of 17,500 (m/z 300) and an AGC target value of 100,000 charges (maximum ion injection time 120 ms). All scan events were recorded in profile mode. A dynamic exclusion list of 25 s was employed and the exclude isotopes functionality was activated.

Data Analysis for Proteome and Phosphoproteome
Mass spectrometric raw data was processed and analyzed using MaxQuant v.1.5.0.0 combining proteome and phosphoproteome measurements. In total, 319 MS runs were included with 29 fractions per experiment (proteome: 12ϫ IEF, phosphoproteome: 16ϫ HILICmonophosphorylated peptides, 1ϫ multiple phosphorylated peptides). The following default search parameters were employed: enzyme specificity trypsin, maximum of 2 missed cleavages, first search mass accuracy tolerance 20 ppm, main search mass accuracy tolerance 4.5 ppm, FTMS MS/MS tolerance 20 ppm, minimum peptide length of 7 amino acids, peptide spectrum match FDR and protein FDR both set to 0.01 as calculated by the revert database approach. Protein quantification was based on a minimum of two ratio counts, originating from unique or razor peptides only. Additionally, unless explicitly stated otherwise, other parameters of the data analysis were not changed from their MaxQuant 1.5.0.0 default value. Search was performed with the match between run and re-quantify options set as TRUE. As fixed modifications cysteine carbamidomethylation was selected, as variable modifications: (STY) phosphorylation, asparagine deamidation and methionine oxidation. Samples were searched as triplex (multiplicity 3) with the corresponding dimethyl labels selected (light: DimethLys0, DimethNterm0, medium: Dime-thLys4, DimethNterm4, heavy: DimethLys8, DimethNterm8). Andromeda searches were performed against Uniprot Homo sapiens proteome database (canonical including isoforms) downloaded Aug 2014 (68,382 entries). Differential expression was performed on log-transformed normalized ratios extracted from the MaxQuant protein-Groups.txt and Phosho(STY)Sites.txt tables using empirical Bayes moderated t test from limma (21), implemented in the in-house built autonomics R package (manuscript in preparation; https://github. com/bhagwataditya/autonomics).

Bioinformatics
Linear Kinase Motif Analysis-Centered phosphomotif sequences were uploaded to PhosphoSitePlus to test for significantly overrepresented motifs and to generate sequence logos. Upstream kinases were predicted by the NetworKIN algorithm implemented in the KSEAapp package (26). To generate a combined heatmap for all sample groups the NetworKIN cutoff was set to 5, with a minimum of 10 members per kinase and p value Ͻ 0.05.
Network Generation-The Phosphopath app (27) within Cytoscape v3.3 was used to visualize phosphoprofiling data in combination with the proteome for selected enriched pathways. Protein-protein interaction networks for clustered features were generated by STRING to determine hub proteins.
Enrichment Analysis-Enrichment analysis was performed with DAVID implemented in the in-house built autonomics R package (manuscript in preparation; https://github.com/bhagwataditya/ autonomics).
Clustering-Unsupervised clustering analysis (fuzzy c-means) on differentially expressed features (early time points: p value Ͻ 0.01, otherwise: FDR Ͻ 0.05) was performed using the Mfuzz R package (28). Fig. S1A), two classical MSC markers (CD73, CD105) were longitudinally moni-tored by FACS analysis (supplemental Fig. S1B). At the final stage of differentiation, both show levels comparable to BM-MSCs. As expected, the hematopoietic marker CD45 was absent at all time points. Additionally, ESC-derived MSCs (day 30, D30) displayed tri-lineage differentiation potential (adipocytes, osteocytes, and chondrocytes) equivalent to BM-MSCs (supplemental Fig. S1C), thus satisfying the minimal criteria for defining multipotent MSCs (31). ESC, MSC, and Mesoderm Markers-The expression of classical ESC and MSC markers by RNA-seq and proteomics followed the expected pattern and further validated the differentiation protocol (Fig. 1C). Known mesoderm markers such as Brachyury (T) and EOMES peaked together with BMP2, HAND1 and TBX20 at the intermediate differentiation stage (D15) whereas other mesoderm markers such as FOXC2, GATA6, PDGFRA, PRRX1, SNAI1, and TWIST1 plateaued during differentiation. With the observed downregulation of KDR, the ESC-MSCs match the criteria for the mesenchymal branch of mesoderm commitment (CD34neg/KDR dim/neg/PDGFR␣ϩ/CD73ϩ) (9) (Fig. 1D, supplemental Fig.  S1D). Based on RNA-seq analysis, several markers showed persistent expression as early as 8h after induction of differentiation. Functionally, these are related to focal adhesion and differentiation. Among them are recognized MSC markers (PLAU, CD44), the serine/threonine kinase (PLK2) and a multifunctional scaffold protein with a role in early ESC fate determination (AHNAK) (75) (Fig. 1E).

Multi-omics
Analytical Depth and Differential Expression Analysis-RNA-seq analysis detected an average of 95,038 transcripts, covering 17,665 genes (min: 11,800/max: 21,565). Of these 10,847 were present globally with read counts, 10,264 of which were annotated as protein-coding. Proteome profiling led to the identification of 9,470 proteins (supplemental Information SI 1), 7864 of which were quantified. Supplemental Fig. S3A summarizes their distribution across the dataset. A total of 4325 proteins were quantified in all samples and stages. We additionally identified 13,826 phosphosites (sup-plemental Information SI 2) mapping to 3621 phosphoproteins. Of the phosphosites 8390 had a localization probability of Ͼ0.75 (class I; 3,211 proteins) and yielded a STY distribution of 92:8:0. Conservatively, statistical analysis was performed on quantified class I sites following proteome normalization, which reduced the data set to 6204 sites (2398 proteins) (supplemental Information SI 3). The distribution of quantified class I phosphosites across the data set is shown in supplemental Fig. S3A with 1369 class I phosphosites quantified in all samples at all stages. Coverage and overlap between the three omics techniques is shown in supplemental Fig. S3B with 2269 features being quantified at transcriptome, proteome and phosphoproteome levels. Differentially expressed features (FDR Ͻ0.05) per technique and sample group are summarized in Fig. 2A based on comparisons to undifferentiated ESCs (D0). Examples of features continuously upregulated during differentiation are given in Fig. 1E, with AHNAK discussed in more detail below. Complete lists of differentially expressed features (FDR Ͻ 0.05) per technique are provided (supplemental Information SI 4 -9).
Functional Enrichment, Differentiation and Signaling Pathways-For downregulated genes, significant enrichment was observed for gene expression-related processes (Fig. 2B). Parallel to a general massive transcriptional downregulation, the observed protein interaction network for gene silencing by RNA (Fig. 3A) suggests a switch in argonaute isoforms, which as core components of the microRNA-induced silencing complex (miRISC) guide miRNAs to their targets. AGO3 appears to replace AGO1 and AGO2 during differentiation. Although originally AGO2 alone was believed to possess miRISC slicer activity, this has recently also been demonstrated for AGO3 in the context of specific miRNAs (32). AKT3-mediated phosphorylation on AGO2 S387 is necessary for miRISC assembly and interaction with the LIMD1 subunit (32). AGO3 can replace AGO2 independent of AKT3, recruiting LIMD1 proteins (LIMD1, WTIP, AJUBA). An AGO2 to AGO3 switch has been described in vitro following LIMD1 ablation (33). Our phosphorylation data supports this shift in the machinery (Fig. 3B) for both argonaute and LIM proteins, exemplified by increased phosphorylation from day 2 onwards on several positions on LIMD1 (S233, S272, S277, S316) and from day 5 on AGO2 (S387, S824). Although the shift from AGO2 to AGO3 during ESC to MSC differentiation is clear based on protein expression, the phosphorylation data suggests a complex regulatory interplay, that warrants further investigation.
Features upregulated during differentiation, display an enrichment for the terms vesicle-mediated transport, cytoskeleton organization, focal adhesion, extracellular matrix organization, development, differentiation and cell signaling (Fig. 2B). Filtering for developmental and differentiation terms reproduces our reported observation that MSCs express a plethora of tissue developmental markers (25), most notably for cardiovascular system, epithelium and neuron development (supplemental Fig. S2C).
Cell signaling pathways affected by differentiation were examined in detail (supplemental Fig. S2D). The enrichment of MSC signaling pathways such as integrin, EGF, VEGF, and TGF␤ signaling further confirms differentiation. Among the affected signaling-related pathways were tyrosine and serine/threonine kinase signaling, which together with enriched MAPK and PKB signaling justifies our approach studying the global phosphoproteome during ESC to MSC differentiation.
LncRNAs and Antisense-Long noncoding (lnc) RNAs are known to play a crucial role in both maintenance of pluripotency and differentiation of ESCs (34,35). During ESC to MSC differentiation 433 lncRNAs and 411 antisense transcripts were differentially expressed (FDR Ͻ 0.05) (supplemental Information SI 9, 10). Functional annotation is available for a few, but of these, CARMN, MALAT, NEAT1 and LINC00152 were among the most prominent induced lncRNAs. CARMN is associated with cardiomyocyte differentiation (36), MALAT1 with osteoblast differentiation (37), NEAT1 with adipogenesis (38) and LINC00152 with epithelial-mesenchymal transition (39). Commonly upregulated from D1 onward was lncRNA MIR4435-2HG, sequence homolog to LINC00152 (40). HOTAIR and HOTAIRM1 were among the most induced antisense transcripts during differentiation, both located in the HOX gene cluster. HOTAIR has been described as an important player in Snail-mediated epithelial-mesenchymal transition (41) and HOTAIRM1 has been linked to neurogenesis (42) (Fig. 2C). The antisense RNA MIAT found to be downregulated during differentiation, has been described as a suppressor of osteogenic differentiation in stem cells (43). LINC-ROR, also downregulated during differentiation, is involved in ESC selfrenewal by trapping miRNA145, which targets SOX2, NANOG and POU5F1 transcripts (44). Besides these annotated lncRNAs observed as changing during differentiation and reflecting the loss of pluripotency and the gain of MSC specific features, many other lncRNA candidates were found to be regulated and warrant future evaluation.
Clustering -Identification of MSC-defining Transcription Factors-Differentially expressed features were grouped using unsupervised fuzzy c-means clustering (Fig. 4). PCA of the clusters for all techniques separates up and downregulated features (Fig. 4A). Clusters of interest are those with intense expression changes (red -increase, blue -decrease) or displaying peak expression on D15 (green). (Fig. 4A, 4B). Enrichment analysis was performed on them and protein interaction networks were generated. Hub proteins as well as the most enriched terms are displayed together with the expression profiles of transcription factors per cluster in Fig. 4C. For RNA-seq, blood vessel development and skeletal system development are top induced terms with the transcription factors FOXC1, FOXC2, FOXD1, FOXP2, NR2F1, NR2F2, PITX1, PITX2, RUNX1, RUNX2, SIX1, SIX2, and TWIST1 most prominently increased during differentiation. Although all are related to cell development, only a subset (RUNX1, RUNX2, TWIST1) are reported to be important in MSC self-renewal, mesodermal tissue development and lineage commitment (45)(46)(47). Transcription factors with a prominent decrease over the time course include the pluripotency factors NANOG, SOX2 and ZSCAN10, as well as ARID3B, RCOR2, SALL4 and ZNF483, which were also identified by proteome profiling as strongly downregulated. Several transcription factors, many of which belong to the homeobox gene family, known to be master regulators of pattern specification and tissue development, display distinct peak expression at D15 and include HOXB2, HOXB4, HOXB5, HOXB6, HOXC8, HOXC10, and HOXD10. Others with a distinct expression peak are the mesoderm markers EOMES and Brachyury (T). Differentiationassociated HOX transcription factors are only found in three clusters, namely cluster 20, 7, and 31. Specifically clusters 7 and 31 contain transcripts with peak expression at intermediate differentiation stage D15, whereas cluster 20 contains transcripts with a prominent induction during differentiation.
Likely because of dynamic range limitations of the technology in combination with low protein abundance, HOX proteins were not covered by proteome profiling. Most prominent induced transcription factors found by proteomics include the PDZ and LIM domain containing proteins PDLIM2, PDLIM5, and PDLIM7, as well as RBCK1, FHL2, NR3C1 and PURA.
From this plethora of transcription factors involved in differentiation (RNA: 777, PROT: 185, PHOS: 46) (Fig. 4D) we selected sentinels with extreme expression based on clustering results as defining the MSC phenotype.
RNA to PROT Correlation-Because RNA expression is not always a proxy for protein expression especially during embryonic development (48), we were interested in gene-wise correlation profiles between both techniques across time points (supplemental Information SI 10). The distribution for 6903 intersecting features is depicted in Fig. 5A with average correlation of 0.5 and a mode at 0.9 demonstrating better correlation than previously published for somatic cells (0.4) (49)or differentiating cells (0.2) (48), but in line with our previous study (50). PCA analysis was performed on differentially expressed features, which display good correlation (R Ͼ 0.7 and FDR Ͻ 0.05) (Fig. 5B). Proteomics data now also shows the distinct intermediate differentiation stage at day 15. The top 30 genes for the first 3 components are shown as heatmaps with correlation indicated (Fig. 5C). Genes which best distinguish ESCs (D0 -5) from MSCs (D15-30, BM-MSCs) are described by principal component 1 and include e.g. SALL2, TIMELESS, ERBB3, CHEK1, and CHEK2 for ESCs; COL1A1, COL3A1, COL5A1, COL5A2, COL12A1, CD44, CD59, LOX, IGFBP7, and MAP1A for MSCs. Genes with expression peaking at the intermediate or late stage of differentiation are represented by principal component 2 and 3, respectively. We regard the set of 1,986 differentially expressed transcripts (RNA-seq) which display good correlation to the proteome (RϾ 0.7 and FDR Ͻ0.05) as high confidence genes involved in ESC to MSC differentiation. This gene set includes 105 transcription factors, 65 kinases and 24 phosphatases (Fig. 5D). Among upregulated transcription factors during differentiation are PDLIM proteins (PDLIM2, 4, 5, 7), PHC2, NR3C1, and PURA. The PDLIM proteins, although not yet associated with MSC physiology, are implicated in MSC related differentiation processes such as osteogenesis and cardiomyogenesis (51,52).
To summarize, around 2,000 differentially expressed features are well correlated between RNA to protein (R Ͼ 0.7 and FDR Ͻ 0.05), allowing to cross-validate findings and focus on MSC specific signature gene products. Phosphoproteome Profiling: Differentiation/Pluripotency and Nuclear Pore-Comparing our data set to PhosphoSite-Plus (as of 08.2017, 234,166 entries for H. sapiens), 301 phosphosites are novel when amino acid position is extracted from canonical protein forms. Novel phosphosites were e.g. found for the transcription factors DACH1 (S392), FOXH1 (S16), NLRP10 (S585), SALL2 (T467), SALL3 (S917), SALL4 (S126, S129, S512), and TCEAL2 (S114). Knowledge about the human MSC phosphoproteome is limited to just 700 sites reported in PhosphoSitePlus. When comparing our data set with reported sites for hESCs and hMSCs, 5,157 phosphosites are novel (Fig. 6A). Among the proteins most extensively phosphorylated (phosphosites per protein) in our data (Fig.  6B) were proteins with the strongest induction of expression during differentiation (AHNAK, VIM, MAP1A, MAP1B) (Fig. 6B, 6C) and extensive differential phosphorylation (AHNAK, VIM, SRRM2, MAP1A, MAP1B) (Fig. 6, supplemental Fig. S3C). When ESCs differentiate and loose pluripotency, massive changes are observed for cell cycle, gene expression and RNA processing (both coding and noncoding). At early differentiation (D1) JARID2, a member of the PRC2 complex essential for histone methyltransferase recruitment and initiating ESC differentiation (53) becomes phosphorylated (S124). This observation is paralleled by a decrease of JARID2 RNA expression over time, as well as of EZH2, a critical subunit of the PRC2 complex (supplemental Fig. S4A). EZH2 decrease has been shown to be specific to mesoderm commitment by reducing H3K27me3 (54). AHNAK, mentioned above as a transcript strongly and stably induced from as early as 8 h of differentiation, is the top phosphorylated protein at D1. It is one of the most phosphorylated proteins in homo (source: PhosphoSitePlus) as well as in our data with strong phosphorylation changes on multiple sites (Fig. 6D). The transcriptional co-repressor TRIM28 is similarly phosphorylated starting with an early differentiation stage (D2: S19, S473, D5: S473, T544), followed by decreased expression at intermediate differentiation stages, which supports recent findings that TRIM28 expression is indispensable to pluripotency and repression of differentiation-inducible genes (55). Also strongly affected by phosphorylation changes from early differentiation on are proteins involved in RNA and transcription factor transport (e.g. nucleoporins NUP188, NUP50, NUP88) (Fig. 3A, 3B), RNA splicing (e.g. SRFS1, SRFS6, SRFS7, SRFS9, SRFS10, SRFS11, SRRM1) and gene expression in general with a vast majority of proteins associated with RNA-binding activity. Nuclear pore complexes consist of Ͼ30 nucleoporins (NUPs) with a tissue-specific composition. NUPs have recently been discussed as scaffolds regulating developmental gene expression (56). Together with the massive downregulation of specific NUPs during ESC to MSC differentiation described above, we observed bidirectional phosphorylation changes (e.g. increase in NUP50 S192, NUP88 S35, NUP188 S1709, decrease in NUP153 T699) (Fig. 3A, 3B), hinting at a more complex regulatory picture.
Phosphorylation Motif Analysis-Differentially phosphorylated peptide sequences were subjected to motif analysis. Phosphomotif counts per time point are shown in Fig. 6E, separately for increased and decreased phosphorylation. Among the 37 motifs thus defined as specific for MSCs were most prominently sXXXXXE, sXD, sXE, and RXXsXS, whereas ESC-specific motifs were sXXXXXK, sXXK, sPXK, and sPS. As compared with the PhosphoSitePlus compendium of experimentally identified phospohosites, the most prominently enriched motive in ESCs was sPXK recognized by CDKs (CDK1,2,5), reflecting the proliferative nature of the cell type. ESC-specific motifs tended to basic c-termini, enriched in K and R (Fig. 6E, 6F). MSC-assigned motifs where mostly of an acidic nature c-terminal to the phosphorylated amino acid with high abundance of D and E, possibly recognized by the acidophilic kinase PLK2 (57), found to be upregulated by RNA-seq from 8 h onward.

DISCUSSION
In this integrative multi-omics study we present a detailed picture of the differentiation of hESCs into MSCs, covering differential expression of protein coding and noncoding ORFs as well as cell signaling events by examining the transcriptome, proteome and phosphoproteome over a 30-day time course. From a bird's eye view, focal adhesion, cytoskeleton organization and extracellular matrix formation are the earliest and most prominent biological terms induced during MSC differentiation, paralleled by a downregulation in DNA replication, RNA splicing, processing and chromosome organization.
Mesoderm markers such as Brachyury, EOMES, and BMP2 are peaking at an intermediate stage, verifying mesoderm lineage commitment. We also observed downregulation of epigenetic regulators including the PRC2 component EZH2, which has been found to promote mesoderm development specifically by reducing H3K27me3 (54). In a recent paper by the Weissman group, a roadmap of 12 mesodermal lineages was provided with detailed expression analysis from single cell RNA-seq (10). Based on mapping our data onto markers from that work, MSCs appear to share lineage characteristics with lateral mesendoderm (PRRX1, HAND1) derived cardiac mesoderm (GARP/LRRC32, NKX2-5). MSCs show high expression of GARP/LRRC32 and transient expression of NKX2-5 (D15 peak). In addition, they express markers of early somites (FOXC2, PDGFRA high ), also defining paraxial mesoderm. This suggests that MSCs may represent the end point to another branch of the developmental roadmap between lateral and paraxial mesoderm (supplemental Fig. S5).
An early step in ESC to MSC differentiation includes the induction of AHNAK. Over 50 phosphorylation sites were identified for AHNAK, 9 of them differentially phosphorylated with early induction at D1 (S1068, S5735, S5782). AHNAK mediates TGF␤-induced downregulation of SMAD3 target genes such as pluripotency factor c-Myc (62), and accordingly iPSC generation is more efficient from AHNAKϪ/Ϫ MEFs. AHNAK is involved in adipogenesis by SMAD1 activation (63), plays a role in vascular healing and vesicle formation (64). Although not described as being functional important in MSCs, all those features link AHNAK to MSC physiology and lineage commitment. During embryonic development AHNAK is expressed in a tissue-specific manner in migratory mesenchyme and tissues which undergo epithelial mesenchymal transitions (65), placing it as an important protein in developing MSCs, most probably related to focal adhesion, one of the earliest enriched biological functions.
In general, many of the highest induced transcription factors belong to the homeodomain family, which are important in body plan pattern specification, organ development and cell fate (66). Developmental HOX transcription factors (HOXA2, HOXA5, HOXA13, HOXB3, HOXB4, HOXB5, HOXB6,  HOXB9, HOXD9, HOXD10, HOXD11) and modulators of cell fate and body patterning cluster with pan-mesodermal markers characterized by transitional expression at an intermediate differentiation stage. Functionally the clusters are related to embryonic pattern and posterior/anterior specification. The concept of transient HOX expression is not new (67) and has been described e.g. for the 5Ј-end of the HOXA cluster during monocyte to macrophage differentiation (68).
Another group of HOX transcription factors (HOXA9, HOXA10, HOXA11, HOXB2, HOXC4, HOXC6, HOXC8, HOXC9, HOXC10) cluster together with the highest induced transcripts, are functionally related to skeletal system development, with TWIST1 as hub protein and the only transcription factor that to date has been functionally associated with MSC fate (45). Based on the assumption of a tissue specific HOX code (69) and their own HOX profiling, Klein and colleagues recently efficiently initiated the differentiation of iPSCs into vascular wall resident MSCs by a lentiviral induced combination of HOXB7, HOXC6 and HOXC8 (70), demonstrating the impact of HOX genes and partially supporting our findings on HOXC6 and HOXC8. HOX expression in MSCs from different origins has been investigated (63), linking our cells to BM-MSC-like cells. However, our data also indicates HOX expression to be a dynamic process, exemplified by a transient induction at the intermediate stage.
When ESCs loose pluripotency, massive gene repression and silencing follows. Interestingly, we observed a considerable increase in the argonaute protein AGO3, suggesting an AGO2 to AGO3 switch, likely regulated by phosphorylationdependent recruitment of LIM domain-containing proteins (71).
Phosphoproteome data with subsequent KSEA analysis suggested PAK2 and RAF1 as top active kinases during ESC to MSC differentiation supported by kinase autophosphorylation pattern. PAK2 is a potential new player in MSC differentiation and physiology, connecting TGF␤ signaling and SMAD activation leading to cell proliferation. Motif analysis clearly showed ESC-specific phosphorylation motifs to be basic, whereas MSC-specific ones were acidic C-terminal to the phosphorylated amino acid. For ESCs this has been known because the cell cycle modulators CDK1 and CDK2, which recognize the sPXK motif, have been reported as the most active kinases in hESCs, being responsible for more than 25% of the ESC phosphoproteome (17). The characteristic found for MSC phosphomotifs is, however, a new finding and may be associated with the kinases found to be induced.
In summary, we present here a time-resolved analysis of the differentiation on hESC to hMSC using a systems-level approach combining transcriptome, proteome, and phosphoproteome profiling. Despite the strong interest in MSCs for stem cell-based therapies, the cell type is underrepresented in omics studies. With this data we provide a rich repository on ESC and MSC biology, specifically adding knowledge to the phosphoproteome of MSCs. We identify important players in the differentiation process, including transcription factors, kinases, phosphatases and noncoding transcripts.
Acknowledgments-J.G. and the Proteomics Core at WCM-Q are supported by "Biomedical Research Program" funds at Weill Cornell Medicine -Qatar, a program funded by Qatar Foundation.

DATA AVAILABILITY
The mass spectrometry proteomics data, including all raw files and data analysis tables produced by MaxQuant, have been deposited to the ProteomeXchange Consortium (http:// proteomecentral.proteomexchange.org) via the PRIDE partner repository (29) with the dataset identifier PXD004652. Individual spectra of the PRIDE-deposited data can be visually interrogated with MS viewer (30) (search key: xjvcljyla8). RNAseq data have been deposited to the Sequence Read Archive (SRA) with the dataset identifier PRJNA507603. URLs for transcriptomics (https://www.ncbi.nlm.nih.gov/sra/PRJNA507603) and for proteomics/phosphoproteomics (https://www.ebi. ac.uk/pride/archive/projects/PXD004652). * This work was partially supported by a grant from Qatar National Research Fund's National Priority Research Program (4-1267-1-194). The statements made herein are solely the responsibility of the authors. We thank the Flow Cytometry Facility within the Microscopy Core at Weill Cornell Medicine -Qatar for contributing. We also thank the Genomics Core at WCM-Q for carrying out the RNA-seq experiments. Both cores are supported by "Biomedical Research Program" funds at Weill Cornell Medicine -Qatar, a program funded by Qatar