Integrated transcriptomic and proteomic analysis of human eccrine sweat glands identifies missing and novel proteins

The eccrine sweat gland is an exocrine gland that is involved in the secretion of sweat for control of temperature. Malfunction of the sweat glands can result in disorders such as miliaria, hyperhidrosis and bromhidrosis. Understanding the transcriptome and proteome of sweat glands is important for understanding their physiology and role in diseases. However, no systematic transcriptome or proteome analysis of sweat glands has yet been reported. Here, we isolated eccrine sweat glands from human skin by microdissection and performed RNA-seq and proteome analysis. In total, ~138,000 transcripts and ~6,100 proteins were identified. Comparison of the RNA-seq data of eccrine sweat glands to other human tissues revealed the closest resemblance to the cortex region of kidneys. The proteome data showed enrichment of proteins involved in secretion, reabsorption, and wound healing. Importantly, protein level identification of the calcium ion channel TRPV4 suggests the importance of eccrine sweat glands in re-epithelialization of wounds and prevention of dehydration. We also identified 2 previously missing proteins from our analysis. Using a proteogenomic approach, we identified 7 peptides from 5 novel genes, which we validated using synthetic peptides. Most of the novel proteins were from short open reading frames (sORFs) suggesting that many sORFs still remain to be annotated in the human genome. This study presents the first integrated analysis of the transcriptome and proteome of the human eccrine sweat gland and would become a valuable resource for studying sweat glands in physiology and disease.


INTRODUCTION
There are approximately 2-4 million sweat glands dispersed throughout the human body [1]. Two distinct types of sweat glands exist in the human skin -eccrine sweat glands, which are not connected to a hair follicle; and apocrine sweat glands, which are connected to hair follicles [2]. The duct of the eccrine sweat gland opens directly onto the skin surface enabling the gland to secrete water and salt-based liquid. In contrast, the apocrine sweat gland releases fluid and oily substance through the hair follicle orifice [3]. Eccrine sweat glands are major sweat glands in humans and are distributed over the entire body. The physiologic role of eccrine sweat glands is to produce sweat in response to a hot environment or exercise and in response to emotional stimuli such as fear, anxiety and pain [4]. The function of eccrine sweat glands is controlled by the hypothalamus through activation of post-ganglionic cholinergic branches of the sympathetic nervous system [5,6] Beyond thermoregulation, understanding the composition and function of eccrine sweat glands can help better understand conditions such as cystic fibrosis (CF), hyperhidrosis, anhidrosis, and atopic dermatitis [7,8]. Furthermore, the role of eccrine sweat gland stem cells in wound healing is being increasingly recognized and regrowth of sweat glands in damaged skin such as in the case of burn victims has tremendous implications [9]. One key step to the understanding the human sweat glands is to obtain an accurate catalog of transcripts and proteins that are expressed [10]. Although a tremendous effort has been put into sequencing transcripts and proteins of a number of human tissues and cell types, there are still no studies that pertain to human sweat glands despite their obvious importance in physiology and disease. Our present study concentrates on the human eccrine sweat glands. We carried out an in-depth analysis of transcriptome and proteome of eccrine sweat glands isolated by microdissections from human skin. First, we compared the RNA-seq data for eccrine sweat glands with the available transcript data of other human tissues available in the Genotype-Tissue Expression (GTEx). Next, we compared the proteome of human eccrine sweat glands to proteomes of other human tissues and cells [11]. In addition, we compared the eccrine sweat gland proteome to sweat proteome. We also investigated whether we could identify missing proteins from the proteomic analysis of human eccrine sweat glands. Finally, we performed a proteogenomic analysis to identify any novel genes or protein isoforms using transcriptome and proteome data.

Isolation of eccrine sweat glands
Deidentified skin as excess material was obtained from otherwise healthy, non-CF adult patients undergoing unrelated reconstructive surgery in the Department of Plastic and Reconstructive Surgery at the Johns Hopkins Hospital. This study was exempt from IRB approval according to institutional protocol. To maintain the donor's anonymity, only the region of sampling was provided as per the approval and no clinical or demographic information was attached to the sample. 25 cm x 15 cm skin was collected from the abdominal region of a single donor. Eccrine sweat glands were dissected out from the skin dermis as previously described [12][13][14]. Approximately 250 eccrine sweat glands were collected and pooled. Briefly, the skin was excised vertically with sharp scissors into small pieces. The cut pieces were bathed in phosphatebuffered Ringer's solution on ice. Neutral red (10 µM) was used to stain the sweat glands. After a few minutes of incubation with neutral red, the sweat glands were viewed through a Zeiss To maximize the identification of proteins, the sweat grand proteins were extracted by two different methods -Filter-aided sample prep (FASP) approach and the guanidine hydrochloride (GuHCl) approach. For the FASP approach, sweat glands were sonicated in 50 mM TEAB/4% SDS/10 mM DTT with Halt protease inhibitor cocktail (Thermo Scientific) for 5 min followed by heating at 95°C for 5 min. After cooling, the samples were alkylated with 30 mM iodoacetamide at room temperature for 30 min followed by centrifugation at 17,000 x g for 10 min. SDS in the sample was removed by FAST method [15]. Briefly, the lysate was diluted with 50 mM TEAB/8 M urea and concentrated with centricon with 30 KDa MWCO for 40 min at RT. This buffer exchange step was repeated five times. Sweat gland proteins were digested with Lys-C at room temperature for 3 h followed by further digestion with trypsin overnight. For the GuHCl approach, sweat glands were sonicated for 5 min with 35% amplitude in 8 M GuHCl, 50 mM HEPES, pH 7.0, 10 mm dithiothreitol in the presence of Halt protease inhibitor cocktail (Thermo Scientific) followed by heating at 90°C for 3 min and centrifugation at 16,000 x g for 10 min. The pellet was resuspended in 50 mM TEAB/4% SDS/10 mM DTT and subjected to further protein extraction with barocycler for 60 cycles in which each cycle consisted of 45,000 psi at 95°C for 50 sec and atmospheric pressure at 25oC for 10 sec. The proteins alkylated with 30 mM iodoacetamide at room temperature for 15 min. The proteins were digested with trypsin at the enzyme to protein ratio of 1:50 at 37 o C overnight. Following enzyme digestion, the peptides were desalted using Sep-Pak C18 cartridge. The peptides prepared through FASP and GuHCl method were fractionated into 24 fractions by basic pH reversed phase liquid chromatography. Briefly, lyophilized samples were reconstituted in solvent A (10 mM triethylammonium bicarbonate, pH 8.5) and loaded onto XBridge C18, 5 μm 250 × 4.6 mm column (Waters, Milford, MA). Peptides were resolved using a gradient of 3 to 50% solvent B (10 mM triethylammonium bicarbonate in acetonitrile, pH 8.5) over 50 min collecting 96 fractions. The fractions were subsequently concatenated into 24 fractions followed by vacuum drying using SpeedVac. The dried peptides were resuspended in 15 µl 10% FA and the entire amount was injected. The peptides prepared through barocycler were fractionated by SCX StageTip as previously reported [16].

Mass spectrometric analysis
The fractionated peptides were analyzed on an LTQ-Orbitrap Elite mass spectrometer (Thermo Scientific, Bremen, Germany) coupled to EASY-nanoLC II system or on Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Scientific, San Jose, CA, USA) coupled to EASY-nanoLC 1200 nanoflow liquid chromatography system (Thermo Scientific, Odense, Southern Denmark). The peptides from each fraction were reconstituted in 10% formic acid and loaded onto a trap column (100 uM x 2 cm) at a flow rate of 5 ul per minute for LTQ-Orbitrap Elite mass spectrometer or onto an Acclaim PepMap100 Nano Trap Column (100 μm × 2 cm) (Thermo Scientific) packed with 5 μm C18 particles at a flow rate of 5 μl per minute for Orbitrap Fusion Lumos Tribrid mass spectrometer. The loaded peptides were resolved at 250 nl/min flow rate using a linear gradient of 10 to 35% solvent B (0.1% formic acid in 95% acetonitrile) over 95 minutes on an analytical column (50 cm x 75 um ID) packed in house for LTQ-Orbitrap Elite mass spectrometer or EASY-Spray column (50 cm x 75µm ID, PepMap RSLC C18 and 2µm C18 particles) (Thermo Scientific) for Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Scientific, San Jose, CA, USA) and was fitted on EASY-Spray ion source that was operated at 2.0 kV voltage. Mass spectrometry analysis for LTQ-Orbitrap Elite mass spectrometer was carried out in a data dependent manner with a full scan in the by guest on April 27, 2019 http://www.mcponline.org/ Downloaded from range of m/z 350 to 1700 in top N mode setting 8 most intense ions. Both MS and MS/MS were acquired and measured using Orbitrap mass analyzer. Full MS scans were measured at a resolution of 120,000 at m/z 400. Precursor ions were fragmented using higher-energy collisional dissociation method and detected at a mass resolution of 30,000 at m/z 400. Automatic gain control for full MS was set to 1 million ions and for MS/MS was set to 0.2 million ions with a maximum ion injection time of 200 and 300 ms, respectively. Dynamic exclusion was set to 60 sec and singly charged ions were rejected. Internal calibration was carried out using lock mass option (m/z 445.1200025) from ambient air. Mass spectrometry analysis for Orbitrap Fusion Lumos Tribrid was carried out in a data dependent manner with a full scan in the range of m/z 350 to 1,550 in top speed mode setting 3 sec per cycle. Both MS and MS/MS were acquired and measured using Orbitrap mass analyzer. Full MS scans were measured at a resolution of 120,000 at m/z 200. Precursor ions were fragmented using higherenergy collisional dissociation method and detected at a mass resolution of 30,000 at m/z 200. Automatic gain control for full MS was set to 1 million ions and for MS/MS was set to 0.05 million ions with a maximum ion injection time of 50 and 100 ms, respectively. Dynamic exclusion was set to 30 sec and singly charged ions were rejected. Internal calibration was carried out using lock mass option (m/z 445.1200025) from ambient air.

Data analysis
Proteome Discoverer (v 2.1; Thermo Scientific) suite was used for identification and quantitation. The tandem mass spectrometry data were searched using SEQUEST HT search algorithms against a human RefSeq database (version 70) supplemented with frequently observed contaminants (99,847 entries for the RefSeq database and 115 common contaminant proteins). The search parameters used were as follows: a) trypsin as a proteolytic enzyme (with up to two missed cleavages) b) peptide mass error tolerance of 10 ppm; c) fragment mass error tolerance of 0.02 Da; d) Carbamidomethylation of cysteine (+57.02146 Da) as fixed modification and oxidation of methionine (+15.99492 Da) as variable modifications. Number of PSMs for shared peptides was counted redundantly for all the proteins to which the share peptides were mapped. Peptides and proteins were filtered at 1% false discovery rate using Percolator node. Protein abundance values were calculated using normalized spectral abundance factor (NSAF) [17]. For the spectral counting of NSAF, modified peptides and shared peptides were included. Gene ontology analysis was performed at DAVID. [18,19] The heatmap for transcriptome and proteome were generated by the pheatmap R package and Perseus software, respectively. [20] Proteogenomic analysis For the identification of novel genes expressed at the protein level, a custom database was generated using an in-house Python script as follows. RNA sequences from the sweat gland RNA-seq data generated in this study were translated into three frames followed by splitting the translated sequences at every stop codon into individual proteins. Protein entries that were shorter than 6 amino acids or redundant were removed leaving 4,723,976 entries in the database after filtering. To avoid forcible mapping of MS/MS spectra of peptides derived from reference protein sequences to be matched to non-reference proteins, proteins annotated in a reference database were retained. The database search against the custom database was conducted in the same way as the one against the reference database except for the enzyme non-specificity on the peptide N-terminal side. Any peptides that had identical matches to proteins in the NR human protein database were removed. For the identification of peptides mapping to novel or partially novel splice junctions, a custom database was generated using RSeQC software and an in-house Python script as follows. Splice junction sequences were extracted with RSeQC software from RNA sequences from the sweat gland RNA-seq data generated in this study followed by splitting the translated sequences at every stop codon into individual proteins. Protein entries that were shorter than 6 amino acids, redundant or not spanning a splice junction were removed leaving 130,789 entries in the database after filtering. The database search was conducted in the same way as the search against the reference database applying 1% FDR in peptide and protein levels. Any peptides that had identical matches in the NR human protein database were removed.

Validation of peptides mapping to missing or novel proteins using synthetic peptides
The identified peptides mapping to missing or novel proteins were synthesized (JPT Peptide Technologies, Berlin, Germany). The peptides were reconstituted with 0.1% formic acid and approximately 1 pmol of peptides to the final concentration were mixed and analyzed on the Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Scientific) as described in the mass spectrometric analysis section. Database search was performed in the same manner as described in the mass spectrometry data analysis section with an exception for the database that is composed of synthetic peptides and 248 common contaminant proteins. Spectra of peptides from the samples were validated by aligning those from synthetic peptides using in-house generated software written in Python programming language.

Experimental Design and Statistical Rationale
No technical and biological replicate experiments were conducted. The scarcity of the large size of human skin that was sufficient to provide >200 eccrine sweat glands hampered our ability to conduct replicate analysis. Although this study has been conducted with sweat glands from only one individual, this is the first study in which transcriptome and proteome of sweat gland were analyzed and, the main focus of this manuscript is profiling of expressed proteins to identify missing or novel proteins.

RESULTS
In this study, we microdissected eccrine sweat glands from healthy adult human skin to perform an integrated transcriptomic and proteomic analysis as described in the following sections and summarized in Figure 1.

RNA-seq analysis of human eccrine sweat glands
We performed RNA-seq on human eccrine sweat glands to investigate the RNA expression profile of sweat glands and to compare it to other human tissues. Two hundred twelve million paired-end reads were obtained and 193 million reads were mapped to reference human genome hg38. Of these, 50 million reads mapped to known protein-coding genes, ~23 million reads mapped to processed pseudogenes, and about 15 million reads mapped to long non-coding RNAs ( Fig. 2A). This led to the identification of 13,343 and 53,432 protein-coding genes and transcripts (RPKM≥1) respectively (Fig. 2B). The majority of genes followed a normal distribution, which was centered at a value of ∼2.14 log2 RPKM (∼4.39 RPKM), whereas the remaining genes formed a shoulder to the left of the normal distribution (Fig. 2B). Among the 53,432 expressed transcripts, 38,358 were annotated in GENCODE and remaining were the novel isoforms arising from alternative splicing and extension events (Supplemental Table S1 and S2). Investigation of the exon-exon junctions revealed that ~ 70% of the splice-donor and acceptor were already known, ~ 22% of the junctions were formed by either new splice donor or acceptor, and 8% of the junctions were formed by both new splice donor and acceptor (Supplemental Table S1 and S2). To determine if eccrine sweat glands exhibited a unique expression profile, we compared their RNA-seq data to that from other human tissues deposited in the GTEx repository using unsupervised hierarchical clustering of global transcriptome data (Supplemental Table S3). Kidney tissue showed the closest expression profile to eccrine sweat glands (Fig. 2C).

Proteomic analysis of human eccrine sweat glands
Since there has not been any published proteomic analysis of human eccrine sweat glands, we employed LC-MS/MS to investigate the proteome of eccrine sweat glands. Eccrine sweat gland proteins were extracted in two different buffers containing either SDS or guanidine hydrochloride (GuHCl). The pellet left after protein extraction with GuHCl was further subjected to further protein extraction using a barocycler to maximize protein extraction. The peptides were analyzed by LC-MS/MS, and the mass spectra were searched using Sequest HT resulting in the identification of 6,100 proteins and 56,936 peptides (Supplemental Table S4). This included chloride intracellular channel protein, sodium potassium-transporting ATPase subunit, sodium bicarbonate transporter, voltage-gated potassium channel, calcium-activated potassium channel, voltage-dependent calcium channels, V-H + -ATPase, AQP5, and carbonic anhydrases. The proteins extracted with three different methods were analyzed independently by LC-MS/MS. The extracted proteins were digested sequentially with Lys-C and trypsin. The proteins extracted with FASP method or GuHCl method were fractionated by basic pH RPLC while the proteins extracted by barocycler were fractionated by a StageTip SCX column. Next, we compared the expression profile of human eccrine sweat glands to Human Proteome Map data after normalizing all the data by NSAF quantification method (Supplemental Table S5). [21], We identified proteins that showed the highest abundance in eccrine sweat glands with >2 fold expression compared to other tissue/cells with the next highest abundance. Based on this, we could classify the proteins enriched in sweat glands into three groups; proteins that are expressed in most tissues, proteins that are expressed at lower level in hematopoietic cells but at a higher level in adult tissues, and proteins that are expressed mostly only in sweat glands (Fig. 3A). In the first group, proteins including LLGL2, RPS28, H2AFY, HNRNPR, ITPR2, LMNA, ELN, S100A11, COL3A1, EPPK1, KRT17, EPHB3, DBI, COX6B1, PYGB, PDK3, DOCK9, SLC1A5, KRT77, GAPDH and H1F0 showed relatively even abundance throughout most human tissues or cells. Pathway analysis showed that these genes are enriched in protein digestion and absorption. In the second group, the proteins included SLC12A2, COL7A1, PIP, CALML5, ITGB4, DSG2, CMA1, POSTN, COL6A2 and COL6A3 showed relatively lower abundance in hematopoietic cells. Pathway analysis showed that these genes are enriched in ECM-receptor interaction, protein digestion and absorption, focal adhesion and PI3K-Akt pathway. The role of these proteins in cell-cell interactions can perhaps explain why they were in low abundance in hematopoietic cells. In the third group, proteins found to be expressed primarily in eccrine sweat glands included FAM180B, PLA2R1, MUCL, KCNN4, PLCB4, PROM2, CLDN7, COL26A1, SCGB2A2, CD300LG, FAM160A1, ACOT6, PRR4, TRPV4, COL17A1, SCGB1D2 and AQP5 (Fig. 3A). Pathway analysis suggested that these genes are enriched in salivary secretion, insulin secretion, protein digestion, reabsorption and inflammation-mediated regulation of TRP channels. Although this is the first study that performed human eccrine sweat gland proteome analysis, there is a study describing the expression of ~ 150 proteins in human sweat [22]. Thus, we were curious to examine the overlap between eccrine sweat gland and sweat proteomes. As expected, the majority of sweat proteins (121 out of 149) were identified in eccrine sweat gland (Fig.  3B). [23] Interestingly, sixty-two of them were among the top 500 abundant proteins in eccrine sweat glands (Supplemental Table S3).
Efforts are ongoing to identify all genes at the protein level, but there are still 2,187 missing proteins that have never been identified [24]. This could be due to the complex architecture of human organs and the expression of proteins in specific cell types or specific locations within an organ. Since human eccrine sweat gland showed distinct patterns of protein expression, we reasoned that this unique expression profile could help us identify missing proteins. We identified 2 proteins that met the criteria of missing proteins that is the protein must be identified with 2 or more peptides, longer than 7 amino acids, and validated by synthetic peptides (Fig. 3 C, Table 1 and Supplemental Figure S1).
These proteome analysis results for sweat glands indicate that sweat glands are quite distinct from other human organs and the proteins abundant in sweat glands were more enriched in secretion and absorption. This could stem from the fact that sweat gland is a type of secretory glands and also microdissected human organ isolated from human skin. In addition, the proteome analysis of this unusual human organ enabled us to identify 2 validated missing proteins implying that more analysis on a micro scale of a human organ such as sweat glands will lead to the identification of more missing proteins.

Proteogenomic analysis of human eccrine sweat glands
We expected that the proteome data in addition to transcriptome data will provide us the information about the expression profiles of sweat glands at two different expression levels, transcriptome and proteome. This could also lead us to identify novel genes that are expressed in sweat glands because we have expression evidence in both transcriptome and proteome levels. First, we explored the relationship between transcript and protein abundance in eccrine sweat glands by correlating the FPKM values of transcripts and NSAF values of proteins. A Spearman's correlation of 0.458 between FPKM and NSAF values was found. This was not unexpected because it has previously been shown that the integration of RNA-seq with proteomics generally has a correlation of about ~0.40 (Supplemental Figure S2) [25]. Next, we attempted to identify novel genes for which we had evidence of expression at both transcriptome and proteome levels. For this, we generated a customized protein database by 3-frame translation of the RNA sequences of eccrine sweat glands and performed a search against the newly generated database. From the database search, we identified 12 novel peptides mapping to 10 different genes composed of 5 protein-coding genes, 4 pseudogenes and 1 lncRNA. Those identified peptides were validated by comparing the MS/MS spectra of those peptides to ones of synthetic peptides. The spectra of 7 peptides mapping to 5 different genes composed of 1 protein-coding gene, 3 pseudogenes and 1 lncRNA perfectly matched to those of synthetic peptides ( Table 2). One out of 7 peptides, AGAPGPAQR was N-terminally acetylated one mapping to the middle part of an alternative frame of the protein-coding gene, CTNNA2 (Fig.  4A). The spectrum of the acetylated peptide showed almost similar pattern as the one of synthetic peptide (Fig. 4B). Strikingly, the 5 peptides out of 7 validated peptides were from pseudogenes. Especially, two pseudogenes, UOX and GAPDHP66, were identified by two different peptides ( Table 2). The two peptides of UOX was mapping to the central region of the pseudogene in the same frame with 53 amino acid distance (Fig. 4C). The spectra of two peptides from sweat glands showed perfect matches to the ones of synthetic peptides (Fig. 4D and E). The two peptides of GAPDHP66 mapped to almost the same region of the gene with an overlap of 19 amino acids. The spectra of the two peptides showed perfect matches to those of synthetic peptides. On the other hand, the peptide of lncRNA, LINC00869 mapped to the 3' part of the gene (Fig. 5D). The MS/MS spectrum of the peptide, AILALLILK, showed a perfect match to the one obtained from fragmentation of the corresponding synthetic peptide (Fig. 5E). We already discovered 8% of novel and 22% of partial novel splice junctions. If we can find any splice junctions in both RNA and protein level, this could be bona fide novel junctions. Thus, we investigated whether there were any novel splice junctions that could be detected at both levels. We discovered 3 novel junctions with protein level evidence, two of them had reliable mass spectra, and one of them was validated by a synthetic peptide. A novel exon junction with three in-frame additional nucleotides was mapped to COL6A3 (Fig 6A). The spectrum of peptides from COL6A3 showed a perfect match to that of synthetic peptide (Fig 6B). These novel proteins expressed from the sweat glands suggest that there still could be many novel proteins that are expressed from sORFs or pseudogenes. UOX has been classified as a unitary pseudogene that is a pseudogene without a parent gene, but other species have a coding protein from this gene. The discovery of peptides from this unitary pseudogene suggests us that this gene has been wrongly classified as a pseudogene while this gene has protein expression in human cells. Thus, these results again confirm that we need to reconsider the classification algorithm of the pseudogenes.

DISCUSSION
In this study, we report the first in-depth RNA and protein expression profiles of human sweat glands. With the publication of human proteome [11,26], and the availability of RNA-seq [27], there are several databases that catalog gene and protein expression profiles from diverse tissues and cell types. However, there is no such reference available on sweat glands, despite their importance in health and disease. Our study was designed to find an answer to an open question about the diversity of RNAs (protein-coding and non-coding) and proteins that are expressed in the eccrine sweat glands. To do so, we conducted RNA-sequencing and proteomic analysis on ~250 sweat glands collected from a healthy individual. In this pursuit, we identified 13,300 protein-coding genes that were expressed at the RNA level, 53,400 non-coding genes and detected ~6,100 proteins in eccrine sweat glands. This gene expression profile can serve as a useful resource better to study the etiology of many diseases relevant to the sweat gland, such as cystic fibrosis, hyperhidrosis, anhidrosis and atopic dermatitis.
Comparison of eccrine sweat gland transcriptome to that of other human tissues/cells revealed its similarity to the cortex of the kidneys. One biologic explanation is that sweat glands share some of the excretory functions like kidney [28]. In this regard, a striking example is the excretion of urea. The levels of urea in sweat are much higher than in serum indicating sweat glands should express urea transporters [29]. A prior study has shown that urea transporter subtypes are expressed in sweat glands [30]. Interestingly, urea levels in sweat are even higher in individuals with the chronic renal disease resulting in uremic frost [31]. In this study, we demonstrate that the urea transporter 1 isoform 1 (SLC14A1), a urea channel known to facilitate transmembrane urea transport, is indeed detectable in both the transcriptome and proteome of eccrine sweat gland. Additionally, we demonstrate that there is extensive overlap in the gene expression between kidney and sweat glands, indicating that many other genes are also similarly expressed in the kidney and eccrine sweat glands. Together, these findings support the evidence that sweat is another way for expelling water and metabolic wastes besides urine.
Evidence from the human eccrine sweat gland proteome is consistent with previous studies that identified physiologic roles of Na + K + ATPase, K + channels, Ca +2 channels, V-H + -ATPase, and AQP5 in secretory mechanisms [12,[32][33][34][35]. Their normal activities result in sweat secretory fluid that is isotonic to plasma with the concentration of Na + ~145 mmol/l and Cl -~115 mmol/l [36,37]. Additionally, detection of sodium potassium pump, and sodium hydrogen ion exchanger 1 in the eccrine sweat gland proteome validates the concept that driving force for reabsorption is a Na gradient into the cell [7]. The loss of Na + from the reabsorptive duct of eccrine sweat gland creates a gradient for Clto pass through the cell and into the extracellular fluid via CFTR and carbonic anhydrases [7,34,38]. As this happens, the final sweat becomes hypotonic to plasma with a concentration of Na + and Cl-below 20 mmol/L [39]. It is well established that disease-causing variants in CFTR result in the defective reabsorption and elevated sweat chloride concentration, therefore sweat chloride measurements are the gold standard method for the diagnosis of cystic fibrosis (CF) [40,41]. Although we have observed CFTR mRNA expression by RNA-seq, protein expression could be not be detected in the proteome of eccrine sweat gland, likely because the levels of CFTR protein were too low to detect.
Importantly, we have recently reported deleterious variants in carbonic anhydrase 12 (CA12) that resulted in dysfunctional protein in individuals exhibiting CF-like phenotypes -elevated sweat chloride concentration and lung disease [42]. Interestingly, our present study confirms the expression of CA12 in the proteome and transcriptome of eccrine sweat glands. This finding validates the importance of this protein in regulating sweat chloride levels. Interestingly, we were also able to detect carbonic anhydrases (2,3,4,13) in the proteome and transcriptome of sweat gland, indicating that they may also alter electrolyte or pH balance in the sweat. Strikingly, proteomics data also showed enrichment for inflammation-mediated regulation of TRPV4 channels. TRPV4 is a nonselective calcium ion channel which is activated by moderate heat (25-34°C), endogenous inflammatory metabolites (like arachidonic acid), and exogenous compounds (like phorbol esters) [43][44][45][46]. TRPV4 appears to be important for skin barrier integrity and contributes to cell-cell junction development, which prevents excess skin dehydration [47]. It has been shown in mouse models that sweat duct is the growth center that repairs epidermal injury [3]. Interestingly, a study demonstrated that mice lacking TRPV4 have an altered temperature behavior under normal and inflammatory conditions [48]. Notably, sweat glands in mice are restricted to foot pads whereas in humans there are millions of sweat glands dispersed throughout the body [4]. More recent studies readdressing this issue using human skin concluded that sweat glands contribute significantly to re-epithelialization after partial-thickness wound [49]. Here we provide both transcriptome and proteome level evidence that TRPV4 is indeed expressed in eccrine sweat glands. Importantly, TRPV3-related skin TRP channelopathy, defined as Olmsted syndrome, was identified in humans which is caused by a gain of function missense mutations [50]. Whether genetic alterations in human TRPV4 are responsible for sweat gland dysfunction that leads to skin dehydration and/or inflammation remains to be determined.
We identified two missing proteins from eccrine sweat glands. However, it is difficult to say whether these two identified missing proteins are unique to sweat glands. Since these proteins have never been identified in other proteomic studies while they were identified in eccrine sweat glands, it is reasonable to consider that these proteins are relatively more abundant in sweat glands. Lastly, we performed the proteogenomic analysis combining the transcriptome with the proteome data for the identification of novel proteins that would have never been identified if we used a reference database. We have identified 3 reliable peptides from novel splice junctions, and one of them was validated by a synthetic peptide. Strikingly, the majority of novel proteins were from pseudogenes. These results imply that our understanding about pseudogene has been incomplete and deeper proteome analyses of microscale of human tissues or individual primary cell types can potentially identify more proteins expressed from genes that have been classified as a non-coding one. In addition, the majority of novel proteins were from sORFs. Recently there have been active researches in sORF discovering that sORFs have their own functions. However, these sORFs have been missed in considering as functional proteins. Thus, these sORFs could be one of the missing dimensions in understanding biological functions. This discovery provides us insight there are still many sORFs that remain to be discovered. We anticipate that this study will assist in a better molecular understanding of human sweat glands and their role in health and disease.
Overall, this study can be used as a starting point for future studies to assess inter-individual variability between individuals with different genetic backgrounds, sex, and age. Studies with multiple individuals should assess batch effects to correct for any technical variability [51,52]. Additionally, single-cell RNA-seq and proteome analysis of eccrine sweat gland would further help identify gene expression in specific cell types, as was observed in two recent studies that demonstrated CFTR is mostly expressed in rare cell type called an ionocyte in the airway epithelium [53,54]. Taken together, this study demonstrates the transcriptome and proteome expression profile of the eccrine sweat gland, which can be used to study the etiology of many diseases relevant to the sweat gland, such as cystic fibrosis, hyperhidrosis, anhidrosis and atopic dermatitis. Figure 1. Schematic diagram for the research strategy. Human eccrine sweat glands were microdissected from human skin. Top, the left side of the figure shows a diagrammatic representation of skin and its appendages. Top, the right side of the figure shows a micrograph of an eccrine sweat gland dissected from the skin using stereomicroscope. Transcriptome and proteome analyses were conducted after the extraction of proteins and RNAs, respectively. For transcriptome analysis, the extracted mRNAs were used for the library preparation and subjected to the RNA-seq. The RNA-seq data were analyzed using HISAT2, StringTie and RSeQC. For proteome analysis, the extracted proteins were digested with Lys-C followed by trypsin. The resulting peptides were pre-fractionated by either bRPLC or an SCX StageTip followed by LC-MS/MS analyses. Acquired mass spectrometry data were subject to a reference protein database and the resulting unmatched spectra were subject to another round of a search against customized databases generated by 3-frame translation of either the RNA-seq or splice junction data.          SLC12A2  COL7A1  PIP  CALML5  ITGB4  DSG2  CMA1  POSTN  COL6A2  COL6A3  DCD  COL2A1  LLGL2  RPS28  H2AFY  HNRNPR  ITPR2  LMNA  ELN  S100A11  COL3A1  EPPK1  KRT17  EPHB3  DBI  COX6B1  PYGB  FAM180B  RHPN2  PDK3  DOCK9  SLC1A5  KRT77  GAPDH  PLA2R1  GPX5  SERAC1  COL21A1  H1F0  FBLN7  FZD1  MUCL1  PPP1R1B  IRF6  EPB41L4B  TPD52L1  KCNN4  PLCB4  PROM2  CLDN7  COL26A1  SCGB2A2  CD300LG  FAM160A1  ACOT6  PRR4  TRPV4  COL17A1  SCGB1D2