Proteome Analysis of Human Hair Shaft

The human hair proteome was investigated using two-dimensional LC-MS/MS. Among the 343 identified proteins, 70 were detected in high relative abundance, including keratin intermediate filament proteins, largely extractable with denaturants. Over 300 proteins were found to constitute the insoluble complex formed by transglutaminase cross-linking. The intracellular distribution of identified proteins is wide from cytoplasm to nucleus, mitochondria, ribosome, and plasma membrane. These results help rationalize ultrastructural features visible in the mature hair. Keratins and several substrates for transglutaminase were found to be posttranslationally modified by methylation and dimethylation. Evidence for ubiquitination of hair proteins was also obtained.

In recent years, identification of proteins participating in cross-linked structures in epidermis and appendages has benefited from the approach of peptide generation with enzymatic fragmentation followed by separation and amino acid sequencing of the peptides. Applied to the corneocyte-crosslinked envelope, a number of constituent proteins have been identified, including some where the isolated peptides were cross-linked, evidence of their participation in isopeptide bonds (7). Advances in current proteomic technology prompt application of this approach to the hair shaft where much of the protein resists extraction, and hence few data are available on the participants. It is evident, however, that much of the resistance is transglutaminase-dependent because hair from individuals afflicted by TGM1-negative lamellar ichthyosis is subject to considerable extraction (8). Present efforts using multidimensional protein identification technology (MudPIT) (9) permitted identification of 343 proteins in the hair shaft.
Large scale identification of the protein components of hair is anticipated to be of importance in helping understand the biogenesis of this epidermal appendage. Much has been learned regarding the mRNA transcripts and translation products in the living cells of the follicle (1, 10) because they are amenable to study by standard molecular biological techniques, but how the protein products are utilized in the mature hair shaft remains uncertain due in part to the paucity of information on which components are retained and which are discarded during terminal differentiation. Moreover this information is anticipated to help understand the molecular basis of aberrant hair phenotypes or even skin afflictions because, as in the case of the skin fragility/woolly hair syndrome (11) and Netherton syndrome (12), the two may have the same or related origin.

Preparation of Hair Samples
Extraction-Scalp hair samples from three unrelated individuals (designated HR, SM, and JY) were examined. Typically 40 mg of hair were rinsed briefly in 5 ml of 2% SDS, 50 mM sodium phosphate (pH 7.8) and drained. The hair was then immersed in 5 ml of 2% SDS, 50 mM sodium phosphate (pH 7.8), 20 mM DTE and incubated overnight at 65°C. The hair was pulverized by magnetic stirring for an hour at room temperature, and the soluble and insoluble materials were separated by centrifugation. The insoluble material was resuspended in 2% SDS, 50 mM sodium phosphate (pH 7.8), 20 mM DTE; incubated overnight at 65°C; and extracted as before. After five such extractions, the protein content in each fraction (including buffer blanks) was estimated by reaction of aliquots with ninhydrin after digestion with 10% sulfuric acid (13). The first two SDS-DTE treatments were found to remove Ϸ80% of the total protein extracted. In six experiments, the final insoluble material comprised 13.3 Ϯ 3.9% of the total protein.
Digestion-Aliquots of the soluble (first extract) and insoluble material were incubated for 0.5 h in 2% SDS, 20 mM DTE, 50 mM phosphate buffer (pH 7.8) and then incubated an additional 0.5 h after adding iodoacetamide to 40 mM, all at room temperature. Proteins were precipitated from the soluble extract by addition of 2.5 volumes of ethanol. The soluble and insoluble protein samples were rinsed with 70% ethanol and then freshly prepared 0.1 M ammonium bicarbonate and resuspended in fresh 0.1 M ammonium bicarbonate adjusted to 2 M in recrystallized urea. The urea, which at 2 M does not interfere with trypsin action, was added to assist in solubilizing proteins and released peptides. To each suspension was added bovine L-1-tosylamido-2-phenylethyl chloromethyl ketone-treated trypsin (Worthington), which was stabilized by reductive methylation (14) to 1% by weight. After 6 -8 h at room temperature with constant stirring, a second equal aliquot of the methylated trypsin was added, and the samples were stirred overnight at room temperature. An estimated 92% of the detergent-insoluble material was solubilized, whereas virtually all of the detergent-soluble material was solubilized in this way.

Separation of Hair Peptides by Two Dimensional Chromatography
Strong Cation Exchange Chromatography (SCX)-Trypsin-digested hair samples were dried and redissolved in ϳ200 l of Solvent A (see below, Mobile phase A) and then injected onto a polysulfoethyl A cation exchange column (100 ϫ 2.1 mm, 5-m diameter, 300-Å pore size) from PolyLC (Columbia, MD) with the flow rate of 200 l/min utilizing the following mobile phases as described (www. proteomecenter.org/): Mobile phase A, 5 mM potassium phosphate (pH 3.0), 25% acetonitrile; Mobile phase B, 5 mM potassium phosphate (pH 3.0), 25% acetonitrile, 350 mM potassium chloride. After the sample was loaded, the run was isocratic for 15 min at 100% mobile phase A, and peptides were eluted using a linear gradient of 0 -25% B over 30 min followed by a linear gradient of 25-100% B in 20 min and then held for 5 min at 100% B. Fractions at 2-min intervals were collected and concentrated by vacuum centrifugation.
Capillary Reverse Phase LC (cLC)-The SCX fractions were loaded sequentially on an on-line trap column (0.25 ϫ 30 mm, Magic C18AQ, 5 m, 100 Å) at a flow rate of 10 l/min with buffer A (see below). After application and removal of salt and urea, the flow rate was decreased to 300 nl/min, and the trap column effluent was switched to a homebuilt fritless reverse phase microcapillary column (0.1 ϫ 180 mm, packed with Magic C18AQ, 5 m, 100 Å, Michrom Bioresources, Auburn, CA) following a published procedure (15). The reverse phase separation of peptides was performed using a Paradigm MG4 system (Michrom Bioresources) with buffers of 5% acetonitrile, 0.1% formic acid (buffer A) and 80% acetonitrile, 0.1% formic acid (buffer B) using a 150-min gradient (0 -10% B for 20 min, 10 -45% for 110 min, and 45-100% B for 20 min).

MS Analysis
LC-MS and LC-MS/MS-Peptide analysis was performed utilizing a Finnigan LCQ Deca XP Plus system (San Jose, CA) coupled directly to an LC column. An MS survey scan was obtained for the m/z range of 400 -1,400, and MS/MS spectra were acquired for the three most intense ions from the survey scan. An isolation mass window of 3.0 Da was used for the precursor ion selection, and a normalized collision energy of 35% was used for the fragmentation. Dynamic exclusion for a 2-min duration was used to acquire MS/MS spectra from low intensity ions.
Data Analysis-The Sequest (16) analysis software (Bioworks version 3.1) was used to find the peptide sequences in a human protein database that best match the observed MS/MS spectra. DTA files (Bioworks version 3.1) in ASCII format for each MS/MS spectrum were generated from the raw data for the peptide mass range of 500 -3,500, minimum ion count of 10, and minimum signal of 10 5 . The International Protein Index human database (ipi.HUMAN.v3.01, December 3, 2004, downloaded from ftp.ebi.ac.uk/pub/databases/IPI) containing 46,941 entries was used for Sequest searching. Peptide (parent ion) tolerance of 2.5 Da and fragment ion tolerance of 1 Da were allowed, and fixed modification of carbamidomethylation on Cys (ϩ57 Da) and differential modification of oxidation on Met (ϩ16 Da) were used. DTASelect software (17) was used to filter out low score matching. Criteria used for filtering were: cross-correlation values (Xcorr) larger than 1.9, 2.2, and 3.75 were used for singly, doubly, and triply charged ions, respectively, for both individual half-or fully tryptic peptides; ⌬Cn values (the difference in Xcorr with the next highest value) less than 0.08 were removed from the matched sequences (18 -20). Proteins with at least two peptide ions satisfying the above criteria were considered correctly identified. Manual examination was carried out for all proteins identified with less than five peptides. Criteria used for manual validation (18) included (a) good quality MS/MS spectra with fragment ions clearly above base-line noise, (b) minimum of four continuous y or b series ions observable, (c) intense y and b ions corresponding to the N-terminal cleavage of Pro but weak ones for C-terminal Pro except where Pro is the second residue from the peptide N terminus, and (d) all the major intense ions interpretable with ProteinProspector (prospector.ucsf.edu/). For the proteins matched with four or five peptides, at least one MS/MS spectrum was validated manually to meet the identification criteria. Proteins covered with two to three peptides were accepted only when a minimum of two spectra passed the manual validation criteria. Further description of the criteria used for protein identification is given under "Results."

RESULTS
The soluble and insoluble fractions were digested separately with trypsin, and the peptides generated were first separated off line using strong cation exchange chromatography, typically obtaining 34 SCX fractions. On-line reverse phase chromatography coupled to mass spectrometry was then performed for each fraction. The resulting experimental MS/MS spectra were searched with Sequest software against theoretical MS/MS spectra obtained from the human database.

Identification of Proteins by Sequest Search
A typical cLC-MS/MS analysis of the 34 fractions of the insoluble hair proteome generated Ͼ200,000 MS/MS spectra. DTASelect filtering of the Sequest results specifying both "fully tryptic" (both cleavages specific for trypsin) and "halftryptic" (one cleavage specific for trypsin and the other nonspecific) cleavage provided over 1,700 non-redundant peptides that were mapped to 250 -300 proteins. As illustrated in Table I, 310 potential proteins were identified in one individual (JY) when the criterion of accepting half-tryptic peptides was applied.
To increase the confidence level of the protein identification, we validated the MS/MS spectra manually as described under "Experimental Procedures." We also compared the number of proteins identified by two different criteria, assuming fully tryptic (tight criterion) and half-tryptic cleavage (loose criterion). As can be seen in Table I, the false positive rates, calculated from the proteins that passed Sequest criteria but failed to pass manual validation, were low (3-8%) when the tight criterion accepting only fully tryptic peptides was used. When half-tryptic peptides were also allowed, false positive rates were significantly increased (7-15%). However, most of the false positives must have been removed by manual validation. We also performed Sequest searches against the reversed sequence database to have another measure of the false positive rate (data not shown). Estimated false positive rates obtained by searching against the reversed sequence database were a bit lower, 6 -11% for half-tryptic peptides compared with manual validation. With one exception, none of the false positive proteins could have passed the manual validation.
The sole false positive protein in the reversed sequence database search that could potentially escape the manual examination is noteworthy in view of the structural similarity of the peptide ions between the real and reversed databases. The two peptides share very similar amino acid sequence and composition, PAM ox DLFQDR (where superscript ox indicates oxidation (ϩ16 Da)) (real) derived from the sequence of Hsp89-␣␦N versus APFDLFENR, a reversed sequence. Note that Met ox and Phe have the same residual mass of 147 Da; internal fragment masses of QD and EN are the same, 243 Da; and mass differences between Gln and Glu and Asp and Asn are only 1 Da, which may fall into the mass error range of the three-dimensional ion trap mass spectrometer. Yet the real sequence of PAM ox DLFQDR yielded higher Xcorr than the analogue (2.67 versus 2.25 for doubly charged ion and 2.01 versus 1.93 for singly charged ion).
After manual examination of the initial filtered spectra, a second manual filtering process was performed to remove redundant proteins to the extent possible. If a set of peptides matches the sequence of two or more separate proteins, the current version (version 1.9) of DTASelect does not eliminate this redundancy. In addition to removal of such redundancy, at least one unique peptide was manually examined for final validation when a protein exhibited unique peptides so that it could be distinguished from others. When we used the criteria accepting both full and half-tryptic peptides combined with the described manual validation procedure and removal of redundant proteins, the total numbers of proteins identified with high confidence were 211, 247, and 221 for the insoluble fraction obtained from HR, JY, and SM, respectively (Table I).

Proteins Identified from Insoluble Proteome
Keratin and KAP Families- Table II shows the major human hair proteins detected in the insoluble fraction from all three individuals. The proteins in Table II each exhibited at least four unique peptide sequences and are sorted by the sum of unique peptides. As can be seen, seven of the top 20 proteins are types I and II keratins of cuticular origin. Non-redundant MS/MS spectra matched to keratins are almost 50% of the total number of spectra (non-redundant) acquired in twodimensional LC-MS/MS: 49% in HR (3,470 of 7,040), 43% in JY (4,135 of 9,597), and 51% in SM (3,051 of 5,946).
A total of five KAPs belong to the category of major proteins as illustrated in Table II. These were not as dominant as keratins in abundance assuming the number of MS/MS spectra is proportional to protein amount (data not shown) (21). The complete list of KAPs identified can be found in Supplemental Table 3. However, KAPs are likely under-represented because they are relatively small proteins (10 -15 kDa) containing limited numbers of trypsin cleavage sites that could result in poor sequence coverage and prevent identification by the Mud-PIT approach. We have attempted using various endopeptidases including subtilisin A and/or chemical cleavage (CNBr) to improve the sequence coverage without much success.
Non-keratin Proteins-Desmosomal proteins are among the most abundant non-keratin proteins found in the insoluble hair proteomes. The number of unique peptides observed in

of human hair proteins detected from the insoluble fraction of three individuals and false positive rates at the given identification criteria
The total number of validated non-redundant proteins from the three samples is 343. False positive rates given in parentheses were calculated from the proteins that passed Sequest criteria but failed to pass manual validation. a Tight criteria. Accepted fully tryptic peptides only. Xcorr at least 1.9, 2.2, and 3.75 for singly, doubly, and triply charged peptide ions, respectively. ⌬Cn at least 0.08. Identified with at least two peptides for each protein.

Sample
b Loose criteria. Same as tight criteria except half-tryptic peptides were also accepted. c MS/MS spectra of the proteins that passed loose criteria were subjected to manual validation. d Redundant proteins were further removed from manually validated proteins. Namely proteins with no unique peptide were removed. For proteins exhibiting unique peptides, at least one was manually examined for final verification.

TABLE II Major human hair proteins identified in the insoluble fractions in all three individuals
At least four unique peptide sequences were identified from each individual. Only one protein entry is listed if several proteins were indistinguishable. The list is sorted in decreasing order of the sum of unique peptides in all three samples (not shown desmoplakin exceeds those of keratins, as can be seen in Table II, indicative of their high relative abundances in the hair shaft. Desmoplakin is a large molecular mass (ϳ332 kDa) structural constituent of the cytoskeleton that plays a vital role in keratinocyte adhesion in linking the transmembranous desmosomal cadherins to the cytoplasmic keratin filament network (22). Other desmosomal proteins identified include plakoglobin, plakophilin, and desmoglein 4. These three cell junction proteins provide essential adhesion structures in most epithelia linking the intermediate filament networks of neighboring cells (22). Additional desmosomal proteins not listed in the major protein category in Table II are plakophilin 3, desmocollin 3 (seen in all three individuals), and desmocollin 1 (seen at a low level in one individual) (Supplemental Table 1). Other notable structural proteins include histones: H2A family, H2B family, and HIST1H4F protein. Their relative abundances, particularly histone families H2A and H2B, are high as indicated by the total numbers of MS/MS spectra (548 for H2B-A and 199 for H2A-O, see Supplemental Table 1). Other structural components classified as major proteins (Table II) are tubulin (␣ and ␤ chains), actin, ␤-catenin, plectin 6, and lamin A/C. Relatively abundant proteins of unknown physiological function include a hypothetical protein, A030011M19, and four metal (selenium and calcium)-binding proteins. The functions of the latter are not fully understood, but they are found to be expressed widely including in the liver, lung, kidney, and mammary gland (23). Localization of S100 calcium-binding protein family members exhibits specificity in the hair follicle. For example, S100A2 is found in the outer root sheath, S100A3 is found in the cortex and cuticle, and S100A6 is found in the inner root sheath (24). Accordingly we found S100 calcium-binding protein A3 to be highly abundant (the number of MS/MS spectra ranges from 52 to 160 among three samples), but S100 A2 and A6 were not detected. It is also worth noting that S100 A14 was found in all three samples with lower abundance (Supplemental Table 1); this is a form previously known to be highly expressed in colon and moderately expressed in thymus, kidney, liver, small intestine, and lung (25) but not known in hair. Along with the S100 calcium-binding protein family, the calcium-binding proteins calmodulin-like protein 3 and annexin A2 were also detected, consistent with the need for calcium ions in the proper function of desmosomal cadherin (e.g. desmogleins) in the living cells of the hair follicle.
Detection of transglutaminase E (TGM3) as a major protein found in all three individuals attests to its likely high importance in formation of N ⑀ -(␥-glutamyl)lysine cross-links of protein substrates in hair. This enzyme was relatively abundant judging from the number of assigned MS/MS spectra (ranging from 25 to 39) in samples from all three individuals. Another enzyme involved in protein cross-linking, transglutaminase K (TGM1), was detected in two samples (Supplemental Table 1) with lower abundance.
c Histone 2B families A and Q (IPI00020101 and IPI00003935) are highly homologous, sharing 10 -12 peptides, and hence are combined as a single entry. d IPI00387144 (␣-tubulin, ubiquitous), found only in sample HR with a single unique peptide, is highly homologous to tubulin ␣-3, sharing six peptides. The peptides from this protein have been treated as if derived from tubulin ␣-3. genase. 7) Enzymes involved in protein or amino acid metabolism: protein-arginine deiminase type III responsible for the conversion of arginine residues to citrullines in trichohyalin. 8) Proteolytic enzymes: lysozyme g-like protein, bleomycin hydrolase, and cathepsin D. 9) Excision of carbohydrate: sialidase 2. 10) Unknown function: LAP3 protein, LGALS3, and several hypothetical proteins just to mention a few.

Proteins Identified from Soluble Fraction
Two-dimensional LC-MS/MS analysis of soluble fraction proteins was also performed for one of the hair samples (HR) as shown in Supplemental Table 2. Despite searching ϳ140,000 MS/MS spectra with Sequest, only 37 proteins were identified, and 23 of them (62%) were keratins and KAP family proteins. Of 3,751 identified non-redundant MS/MS spectra, 3,678 (98%) were from keratins and KAPs. Hence it is likely that the soluble fraction contains mostly disulfidecross-linked keratins and KAPs extractable with SDS-DTE. All the minor non-keratin proteins observed in the soluble fraction were detected in the insoluble fraction as well. These non-keratin proteins may have formed disulfide bridges with keratins and KAPs before extraction or were trapped in the keratin matrix without participating in cross-links.

Cellular Location and Functional Classification of Proteins Cross-linked by Transglutaminases
The intracellular locations of the identified proteins are shown in Fig. 1. As illustrated, the majority of proteins identified (47.8%) originated from the cytoplasm. The next three significant locations were the nucleus (13.1%), plasma membrane (7%), and mitochondria (6.4%). The original location of ϳ11% of the detected proteins is unknown.
As can be seen in Fig. 2 the proteins have a diversity of functions from structural (21.3%) to DNA/RNA/protein synthesis (15.5%), metabolism (19.8%), and signaling (14%). Minor but still significant functions include transport (4.7%), protein targeting (3.8%), glycolysis (3.8%), cell growth/maintenance (4.7%), and cell death/defense (2%). Proteins of unknown function correspond to about 6.4%. The observed wide cellular functional distribution of proteins reflects the wide range of proteins that can serve as substrates for transglutaminases.

Posttranslational Modification (PTM) of Human Hair Proteins
To find PTMs, we constructed a subprotein database comprised of all the identified proteins from the three hair samples. All MS/MS spectra were searched again several times with Sequest against this human hair protein database of much smaller size. As a start, four PTMs that were analyzed including ubiquitination (delta mass of ϩ114 Da on lysine), methylation (ϩ14 Da on lysine, arginine, and histidine), trimethylation/acetylation and dimethylation (ϩ42 Da on lysine, ϩ28 Da on lysine), and phosphorylation (ϩ80 Da on serine, threonine, and tyrosine). Because inclusion of these four PTMs in searches would increase the risk of false positives, very tight criteria were used to ensure the high level of confidence in identification: only fully tryptic peptides without missed cleavages (except modified lysine and/or arginine) and with Xcorr Ͼ1.9 for singly charged, Ͼ2.2 for doubly charged, and Ͼ3.75 for triply charged peptide ions. Furthermore all MS/MS spectra meeting these criteria were manually examined with higher standards than used for protein identification: no unassignable major peaks were allowed, and most y and b series hprd.org/protein) and in some cases the human heart mitochondrial proteome (49). In the latter, subclasses of DNA synthesis, protein modification, transcription, and translation are all classified as DNA/ RNA/protein synthesis. ions had to be observed. In this way, we were able to detect examples of ubiquitination, methylation, and dimethylation. Fig. 3a shows the MS/MS spectrum of a peptide derived from ubiquitinated ubiquitin. The ubiquitinated peptide, LI-FAK GG QLEDGR (where superscript GG represents the C-terminal Gly-Gly residues of ubiquitin) is derived by trypsin digestion of ubiquitin itself. Detection of this signature peptide (28) arises from the oligomerization of ubiquitin to initiate protein degradation by the proteosome. Detected in hair samples from all three individuals, its high abundance (thus readily detectable) should be indicative of the prevalence of ubiquitin-tagged proteins in the proteome.
Using stringent criteria, we detected methylation of the 12 proteins shown in Table III. Besides histones, well known for this modification (29,30), conclusive evidence was obtained for methylation/dimethylation of eight other hair shaft proteins: calmodulin-like protein 3, desmoplakin, sialidase 2, ac-tin (cytoplasmic 1), unknown protein (similar to RIKEN cDNA 4732495G21 gene), keratin I-HA2, keratin II-HB2, and keratin II-HB6. Methylation of mouse and bovine actin (cytoplasmic 1) on His-73 is well known (www.expasy.org/). The residue in calmodulin homologous to Lys-115 in calmodulin-like protein 3 is trimethylated (31), but in the current study a single methylation (delta mass of ϩ14) was observed in the latter (Fig. 3b). In the case of keratin II-HB2, a lysine residue in the sequence LQQETNNVKAQR seems to be modified both by dimethylation and trimethylation (or acetylation). To the best of our knowledge, methylation/dimethylation/trimethylation of keratins, desmoplakins, and sialidase 2 have not been reported previously. Even in the histone molecules, none of the methylation sites were known except the two recently reported in histone 3, VTIMPK m DIQLAR (where superscript m indicates methylation (ϩ14 Da)) and EIAQDFK dm TDLR (where superscript dm indicates dimethylation (ϩ28 Da)) (32).

DISCUSSION
In this extensive MudPIT examination of the human hair shaft proteome, we identified a total of 343 proteins from three hair samples. However, the possibility must be considered that some proteins present in the hair escaped detection. The huge dynamic range of protein abundance in cells and the insolubility of many membrane proteins in general are limiting factors. In addition, the unknown digestibility of proteins extensively cross-linked by N ⑀ -(␥-glutamyl)lysine and the solubility of the peptides contribute to the uncertainty. These factors help rationalize the finding that of the total 343 proteins identified only 143 were seen in samples from all three individuals. As an example, trichohyalin, a major constituent of cross-linked material in the hair shaft medulla (1), was detected in only one individual proteome (JY) in this study (Supplemental Table 1). Its under-representation is likely due to its high degree of cross-linking, the extensive conversion of constituent arginine to citrulline residues, resulting in fewer trypsin cleavage sites, and the low proportion of protein in human hair arising from the single intermittent column of medulla cells at the center of the shaft.
The hair shaft is comprised of a surprisingly large number of intracellular proteins. The SDS-DTE-soluble proteins consist primarily of a moderately complex but well studied mixture of keratins and KAPs. "Hard" keratins are the most abundant structural proteins in hair. Encoded by a large multigene family in two major sequence groups ("acidic" or type I and "basic" or type II), analogous to the "soft" keratins found in the epidermis, they form a cytoplasmic network of 8 -10-nmdiameter intermediate filaments. Their expression is coordinated with that of smaller matrix components such as the associated cysteine-rich and glycine/tyrosine-rich KAP fami-

TABLE III Posttranslational modification of mono-, di-, and trimethylation of human hair proteins identified from the insoluble fraction
Three cLC-MS/MS data sets from insoluble fractions were re-searched against 343 identified proteins with possible ubiquitination (ϩ114 on Lys), methylation (ϩ14 on Lys, Arg, His), trimethylation/acetylation and dimethylation (ϩ42 on Lys, ϩ28 on Lys, Arg), and phosphorylation (ϩ80 on Ser, Thr, Tyr). Very tight criteria were used (fully tryptic peptides, no missed cleavage except modified Lys or Arg, Xcorr Ն 1.9, 2.2, and 3.75 for ϩ1, ϩ2, and ϩ3 ions, respectively) and manually validated with high standards (No unassignable significant intensity peaks were allowed, and most y and b series ions should be identified.). Observed modifications on amino acids are denoted with the following superscript symbols: m, methylation (ϩ14 Da); dm, dimethylation (ϩ28 Da); tm, trimethylation/acetylation (ϩ42 Da) (This assignment is tentative because it cannot be distinguished from acetylation (ϩ42 Da), and also the possibility of carbamylation cannot be totally eliminated.); ox, oxidation (ϩ16 Da). Bold indicates posttranslationally modified amino acid residues.

Protein
Sample A novel finding in the present work is the methylation, dimethylation, and trimethylation observed in the keratins. These modifications are all in the central domain dominated by ␣-helical subsegments in contrast to other modifications (glycosylation, phosphorylation, and isopeptide bonding) located in the N-or C-terminal domains (33). Some of the observed protein methylation sites show strong sequence homologies, including LASYLTRVR (Arg-113 in keratin H2A) versus FASFINKVR (Lys-137 in keratin HB2) and KY-EEEVSLR (Lys-193 in keratin HB6) versus KYEEELSLR (Lys-207 in keratin HB2), where matching residues are shown in bold. Lys-115 in calmodulin is known as a site of trimethylation, and this lysine residue is found within a highly conserved six-amino acid loop (LGEKLT) that forms a turn between helix-6 and -7 (31). A very similar six-amino acid loop was also found in calmodulin-like protein 3 (Table III), LGEKLS, showing conservative replacement of Thr with Ser. Hence methylation sites seem to have sequence motifs in keratins and calmodulins. As shown in Table III, methylation was observed not only in keratins and calmodulin-like protein 3 but also in desmoplakin, sialidase 2, actin, and an unknown protein. We also found extensive methylation on histones (Table III). It is worth noting that methylation but not dimethylation on residue Lys-122 (VTIMPK m DIQLAR and VTIM ox PK dm DIQLAR) was observed in a recent extensive investigation of histone 3 methylation (32). The biological significance of the newly discovered methylation/dimethylation/trimethylation sites on keratins, calmodulin-like protein 3, desmoplakin, sialidase, actin, and histones remains to be clarified.
Very careful interpretation of MS/MS spectra was required in characterization of methylation and/or dimethylation sites because the small mass changes make it difficult to distinguish these modifications from possible point mutations (or isoforms). With the lower resolution ion trap mass spectrometer used for the present study, for example, it is not possible to differentiate a dimethylated Lys residue from replacement of Lys by Arg; with nominal masses of 156 Da, the two choices differ by only 0.025 Da. Other possible amino acid replacements in the surrounding sequence of a peptide should be considered when y or b ions of corresponding methyl-Lys are not observed (i.e. GK m has the same residual mass as AK). Possible changes of 14 or 28 Da when one amino acid is replaced by another include: Gly to Ala, Ser to Thr, Val to Leu or Ile, Thr to Asp, Asn to Gln or Lys, and Asp to Glu (ϩ14 Da) and Ala to Val, Ser to Asp, and Thr to Glu (ϩ28 Da). Replacement by two amino acids should also be considered: Lys m by VV (142 Da), Arg m by GL/LG or AV/VA (170 Da), Lys dm by GV/VG (156 Da), Arg dm by AL/LA or SP/PS (184 Da), and Lys tm (where superscript tm indicates trimethylation/ acetylation (ϩ42 Da); this assignment is tentative because it cannot be distinguished from acetylation (ϩ42 Da), and also the possibility of carbamylation cannot be totally eliminated) by GL/LG or AV/VA (170 Da). To eliminate the above possibilities, extensive BLAST searches were carried out with all peptides in Table III as the query sequences with plausible replacements for the residues in bold. Blast results did not list any sequences of such single amino acid replacement. In addition, as can be seen in the MS/MS spectra of posttranslationally modified peptides (Supplemental Fig. 1), the observed y and b series ions countered such possibilities. The novel observation of Arg double methylation (LASYLTR dm VR) could also be interpreted as a single methylation of the homologous peptide (LASYLDR m VR) from cytokeratin 18. However, clear observation of y 3 and b 6 ions corresponding to R dm VR and LASYLT fragments in its MS/MS spectrum, whereas those for LASYLDR m VR were absent (Supplemental Fig. 1), gives confidence in the assignment of double methylation of Arg.
A striking finding is that the proteins comprising the hair shafts are derived from nearly the entire cell. This finding can be rationalized by ultrastructural examination of hair after extensive detergent extraction, permitting visualization of the intricate remaining features of each cell type (6). Most germane, little loss of protein content is evident by extraction of cuticle cells. The marginal band (A layer) at the outer border, the finely textured material in the outer half (exocuticle), and the amorphous large grained material in the inner half (endocuticle) appear largely unaffected. Large granules in the endocuticle, some of which resemble mitochondria in appearance, are reminiscent of the "dust bin hypothesis" for the origin and variability of keratinocyte-cross-linked envelopes (34) and could account for the variety of proteins observed. By contrast, extracted cortical cells are largely devoid of internal content. This observation indicates that the keratins are largely solubilized, but the cross-linking of a substantial fraction into insoluble material is evident biochemically and can be important functionally in epidermal cells (35). This phenomenon likely reflects the close association of the keratins with a variety of linker proteins (including desmoplakin, plakophilin, plakoglobin, and desmoglein 4) that facilitate connecting the cytoskeleton with desmosomes and whose absence has pathological consequences (36). In the extracted hair, cortical cell boundaries are clearly visible, indicative of at most only low level cross-linking of the cytoskeleton with junctional proteins that are prominent in the tabulation. Consistent with characterization of the role of trichohyalin as a transglutaminase substrate (37), the medulla cells where it is largely found exhibit large deposits of insoluble material, including nuclear remnants, after extraction. This material could be a source of proteins associated with chromatin, such as histones, although the contribution of the medulla to the total material presumably is much lower for human hair than for mouse hair in view of the prominence of the medulla in the latter (38).
Among the common desmosomal junctional proteins, only desmocollins seem to be lacking in the hair shaft. Whether this has functional significance or simply reflects detection difficulties is uncertain. In view of the variable protein composition of desmosomes in the hair follicle (39) and the possibility that remodeling may occur, as reported for inner root sheath desmocollin (40), the paucity of desmocollins could influence adhesiveness among cells of the hair shaft. Because expression of members of this family are important for regulating intercellular adhesion (41,42), the possibility that hair cell adhesive properties are modulated by substitution or augmentation with another protein may merit consideration. This speculation is prompted by the observation that the hypothetical protein A030011M19 appears highly abundant (764 MS/ MS) in this regard next only to desmoplakin (1,114 MS/MS) outside the keratins. Although of unknown function and with little sequence homology to other proteins in GenBank TM , it is highly conserved among mammals and appears to be an integral membrane protein according to Kyte-Doolittle hydropathy plotting (not shown).
The high abundance of keratins in the detergent-insoluble material in the hair shaft probably reflects their well known and extensive interactions with other cellular components, including those in intercellular junctions and even the nuclear periphery (33,43). Because the keratins are largely extractable from cells of the cortex without solubilizing the cell borders (6), keratins in the insoluble cross-linked material must be derived largely from cells of the cuticle and medulla. Which proteins are actually connected directly is not known, but keratins are likely connected to a large number of their neighbors and may even require other proteins, particularly Gln donors, to participate extensively in the cross-linking through Lys residues. Isolating cross-linked peptides would permit assignment of partners and their sites of interaction. This information would also be of interest in view of the abundance of chromatin proteins such as histones H2A and H2B in the insoluble material. The finding that keratins in situ can be cross-linked chemically to DNA (44) raises the question whether they could be cross-linked enzymatically to nuclear matrix or histones. Although a responsible isozyme has not yet been identified, transglutaminase cross-linking in the nucleus likely accounts for the nuclear deposits in cells of the medulla, a phenomenon also evident in corneocytes of the nail plate (45).
Although the extraction of cuticle protein is extensive in hair from individuals afflicted with TGM1-negative familial recessive lamellar ichthyosis, considerable material resists extraction (46). This finding suggests that other transglutaminase activity participates in the stabilization process. Observation of TGM3 protein among those identified in all three hair samples implicates this isozyme as an important contributor to the cross-linking. Intuitively because of its largely membranebound localization, TGM1 would be expected to cross-link proteins at the cell periphery, whereas the soluble TGM3 would attach cytoplasmic proteins to each other. The com-plementary action of two (or more) such enzymes with differing substrate specificities would maximize the resulting stabilization. From this perspective, deficiency in TGM1 activity could easily account for the lack of a marginal band at the outer edge of cuticle cells in hair from a TGM1-deficient patient with congenital ichthyosiform erythroderma (47). However, the envelope-like margins of cortical cells observed even in cases of severe lamellar ichthyosis (TGM1-negative) indicate these are not TGM1-dependent. Moreover the activity leading to stabilization of nuclear material remains uncertain because the appearance of medulla cells from such patients has not been reported.
Present findings offer the prospect of several valuable directions for future investigation. Refinements in the present results include improving the yield of peptides from technically difficult proteins such as the KAPs. Although the location in the mature hair of many of the proteins identified can be inferred from information in the living cells, these could be confirmed by immunohistochemistry possibly assisted by preparations of individual cell types (especially cuticle). Because the cross-linking specificity of transglutaminases relies primarily on available Gln residues, these could be subject to mapping to understand whether other proteins in hair serve the function of Gln-rich trichohyalin in the medulla (and involucrin in epidermis) to connect numerous proteins by virtue of their Lys residues or possibly with polyamines (48). Moreover noninvasive diagnostic applications may become evident from examination of hair samples from individuals with certain genetic hair or skin diseases such as the ichthyoses. Specific perturbations in hair structure visible after detergent extraction, as in the case of trichothiodystrophy (46) and congenital ichthyosiform erythroderma (47), suggest in addition that further understanding of hair structure will result. * This work was supported in part by United States Public Health Service Grants R01 AR27130, P42 ES04699, and P30 ES05707. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.