Advertisement
Research|Articles in Press, 100585

Proteogenomic Features of the Highly Polymorphic Histidine-rich Glycoprotein (HRG) Arose Late in Evolution

  • Yang Zou
    Affiliations
    Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584 CH Utrecht, the Netherlands

    Netherlands Proteomics Center, Padualaan 8, 3584 CH Utrecht, the Netherlands
    Search for articles by this author
  • Bas van Breukelen
    Affiliations
    Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584 CH Utrecht, the Netherlands

    Netherlands Proteomics Center, Padualaan 8, 3584 CH Utrecht, the Netherlands
    Search for articles by this author
  • Matti Pronker
    Affiliations
    Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584 CH Utrecht, the Netherlands

    Netherlands Proteomics Center, Padualaan 8, 3584 CH Utrecht, the Netherlands
    Search for articles by this author
  • Karli Reiding
    Affiliations
    Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584 CH Utrecht, the Netherlands

    Netherlands Proteomics Center, Padualaan 8, 3584 CH Utrecht, the Netherlands
    Search for articles by this author
  • Albert J.R. Heck
    Correspondence
    Corresponding author: Albert J.R. Heck
    Affiliations
    Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Padualaan 8, 3584 CH Utrecht, the Netherlands

    Netherlands Proteomics Center, Padualaan 8, 3584 CH Utrecht, the Netherlands
    Search for articles by this author
Open AccessPublished:May 25, 2023DOI:https://doi.org/10.1016/j.mcpro.2023.100585

      Highlights

      • The histidine rich glycoprotein (HRG) contains 5 minor allele frequencies (MAFs)
      • Proteogenomics is used to probe correlations between these MAFs in the human population
      • Some mutations are mutually exclusive, others preferentially co-occur
      • When studying HRG, accounting for allelic variants is crucial
      • HRG mutations appeared late in evolution, and led to partial loss of N-glycosylation

      Abstract

      Histidine-rich glycoprotein (HRG) is a liver-produced protein circulating in human serum at high concentrations of around 125 μg/mL. HRG belongs to the family of type-3 cystatins and has been implicated in a plethora of biological processes, albeit that its precise function is still not well understood. Human HRG is a highly polymorphic protein, with at least 5 variants with minor allele frequencies (MAF) of more than 10%, variable in populations from different parts of the world. Considering these 5 mutations we can theoretically expect 35 = 243 possible possibly genetic HRG variants in the population. Here, we purified HRG from serum of 44 individual donors and investigated by proteomics the occurrence of different allotypes, each being either homozygote or heterozygote for each of the 5 mutation sites. We observed that some mutational combinations in HRG were highly favored, while others were apparently missing, although they ought to be present based on the independent assembly of these 5 mutation sites. To further explore this behavior, we extracted data from the 1000 genome project (n ∼ 2500 genomes) and assessed the frequency of different HRG mutants in this larger dataset, observing a prevailing agreement with our proteomics data. From all the proteogenomic data we conclude that the 5 different mutation sites in HRG are not occurring independently, but several mutations at different sites are fully mutually exclusive, whereas other are highly intwined. Specific mutations do also affect HRG glycosylation. As the levels of HRG have been suggested as a protein biomarker in a variety of biological processes (e.g., aging, COVID-19 severity, severity of bacterial infections), we here conclude that the highly polymorphic nature of the protein needs to be considered in such proteomics evaluations, as these mutations may affect HRG’s abundance, structure, post-translational modifications, and function.

      Graphical abstract

      List of used abbreviations:

      HRG (Histidine-rich glycoprotein), MAF (minor allele frequencies), HRR (domain: histidine-rich domain), PRR1 and PRR2 domain (proline-rich domains), N domain (N-terminal domain), C domain (C-terminal domain), AHSG (α2-Heremans Schmid-glycoprotein), FETUB (fetuin-B), KNG (kininogen), aPTT (activated partial thromboplastin time), EUR (European), SAS (south Asian), AFR (African), AMR (American), EAS (east Asian), IMAC (immobilized metal affinity chromatography), A1AT (alpha-1-antitrypsin), SDC (sodium deoxycholate), TCEP (tris(2-carboxyethyl)phosphine), CAA (chloroacetamide), HCD (higher-energy collisional dissociation), GWAS (genome-wide association studies), NWO (Netherlands Organization for Scientific Research)

      Keywords

      Introduction

      Next to the extremely abundant albumin and immunoglobulins proteins present in serum several other proteins are fairly abundant, among them the histidine-rich glycoprotein (HRG). HRG is a glycoprotein mostly produced in the liver that circulates in human serum at concentrations of approximately 100-150 μg/mL. Structurally, the HRG protein consists of 525 amino acids spanning 6 annotated domains: 2 N-terminal domains (N1 and N2 domain), 2 proline-rich regions (PRR1 and PRR2 domain), a histidine-rich domain (HRR domain) and a C-terminal domain (C domain)(
      • Poon I.K.H.
      • Patel K.K.
      • Davis D.S.
      • Parish C.R.
      • Hulett M.D.
      Histidine-rich glycoprotein: the Swiss Army knife of mammalian plasma.
      ) (Figure 1a). HRG belongs to the family of type-3 cystatins together with other fairly abundant serum proteins such as α2-Heremans Schmid-glycoprotein (AHSG), fetuin-B (FETUB), kininogen (KNG)(
      • Lee C.
      • Bongcam-Rudloff E.
      • Sollner C.
      • Jahnen-Dechent W.
      • Claesson-Welsh L.
      Type 3 cystatins; fetuins, kininogen and histidine-rich glycoprotein.
      ). These four proteins are structurally and functionally related and their genes are also located closely to each other on chromosome 3. They all share sequence and structural homology, and all contain 2-3 cystatin domains. FETUB is even a very close paralog of HRG; alignment of these two genes reveal around 35% identity, and also structurally these two proteins are very alike (Figure 1b)(
      • Lee C.
      • Bongcam-Rudloff E.
      • Sollner C.
      • Jahnen-Dechent W.
      • Claesson-Welsh L.
      Type 3 cystatins; fetuins, kininogen and histidine-rich glycoprotein.
      ,
      • Waterhouse A.
      • Bertoni M.
      • Bienert S.
      • Studer G.
      • Tauriello G.
      • Gumienny R.
      • Heer F.T.
      • de Beer T.A.P.
      • Rempfer C.
      • Bordoli L.
      • Lepore R.
      • Schwede T.
      SWISS-MODEL: homology modelling of protein structures and complexes.
      ). Unique for HRG are its histidine and proline rich regions. This unique histidine rich domain enables efficient HRG purification from serum using Ni2+ or Co2+ metal-affinity chromatography(
      • Kassaar O.
      • Schwarz-Linek U.
      • Blindauer C.A.
      • Stewart A.J.
      Plasma free fatty acid levels influence Zn(2+) -dependent histidine-rich glycoprotein-heparin interactions via an allosteric switch on serum albumin.
      ,
      • Mori S.
      • Takahashi H.K.
      • Yamaoka K.
      • Okamoto M.
      • Nishibori M.
      High affinity binding of serum histidine-rich glycoprotein to nickel-nitrilotriacetic acid: the application to microquantification.
      ,
      • Weyrauch A.K.
      • Jakob M.
      • Schierhorn A.
      • Klösgen R.B.
      • Hinderberger D.
      Purification of rabbit serum histidine-proline-rich glycoprotein via preparative gel electrophoresis and characterization of its glycosylation patterns.
      ,
      • Kassaar O.
      • McMahon S.A.
      • Thompson R.
      • Botting C.H.
      • Naismith J.H.
      • Stewart A.J.
      Crystal structure of histidine-rich glycoprotein N2 domain reveals redox activity at an interdomain disulfide bridge: implications for angiogenic regulation.
      ,
      • Colwell M.
      • Ahmed N.
      • Butkowski R.
      Detection of histidine-rich glycoprotein and fibrinogen with nickel-enzyme conjugates: Purification of rabbit HRG.
      ,
      • Patel K.K.
      • Poon I.K.H.
      • Talbo G.H.
      • Perugini M.A.
      • Taylor N.L.
      • Ralph T.J.
      • Hoogenraad N.J.
      • Hulett M.D.
      New method for purifying histidine-rich glycoprotein from human plasma redefines its functional properties.
      ).
      Figure thumbnail gr1
      Figure 1Schematic structure of HRG and its gene variants with MAFs of more than 10%. a) Schematic representation of protein domains of human serum HRG. HRG consists of 6 domains: 2 N-terminal domains (N1 and N2), 2 proline-rich regions (PRR1 and PRR2), a histidine-rich region (HRR), and a C-terminal domain (C). The disulfide bridges are depicted in dashed lines. b) Partial structure and N-glycosylation sites on HRG. The structure of HRG is obtained by Swiss-model homology modeling based on Fetuin-B (6hpv.1.A) (
      • Waterhouse A.
      • Bertoni M.
      • Bienert S.
      • Studer G.
      • Tauriello G.
      • Gumienny R.
      • Heer F.T.
      • de Beer T.A.P.
      • Rempfer C.
      • Bordoli L.
      • Lepore R.
      • Schwede T.
      SWISS-MODEL: homology modelling of protein structures and complexes.
      ,
      • Bienert S.
      • Waterhouse A.
      • de Beer T.A.P.
      • Tauriello G.
      • Studer G.
      • Bordoli L.
      • Schwede T.
      The SWISS-MODEL Repository—new features and functionality.
      ,
      • Guex N.
      • Peitsch M.C.
      • Schwede T.
      Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective.
      ,
      • Studer G.
      • Rempfer C.
      • Waterhouse A.M.
      • Gumienny R.
      • Haas J.
      • Schwede T.
      QMEANDisCo—distance constraints applied on model quality estimation.
      ,
      • Bertoni M.
      • Kiefer F.
      • Biasini M.
      • Bordoli L.
      • Schwede T.
      Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology.
      ). The histidine-rich domain which does not tend to crystallize is represented with an oval. The residues of the N-glycosylation sites are shown as dark blue. A glycosylation site at Asn202 is induced by rs9898(30). c) The frequencies of five abundant gene variants of HRG in different subpopulations. The subpopulations are named as follows: EUR: European, SAS: south Asian, AFR: African, AMR: American, EAS: east Asian. d) Gene variants of HRG with MAF of more than 0.1 (i.e. 10% of the population) and their corresponding primate alleles(
      • Chang K.T.
      • Guo J.
      • di Ronza A.
      • Sardiello M.
      Aminode: Identification of Evolutionary Constraints in the Human Proteome.
      ).
      Although HRG belongs to some of the most abundant proteins in serum and is fairly well-studied, the precise function of HRG in serum is still not well-defined. This is most likely due to HRGs complex multi-layer molecular characteristics. To illustrate this, HRG has been reported to interact with a variety of ligands, including heparin(
      • Heimburger N.
      • Haupt H.
      • Kranz T.
      • Baudner S.
      [Human serum proteins with high affinity to carboxymethylcellulose. II. Physico-chemical and immunological characterization of a histidine-rich 3,8S- 2 -glycoportein (CM-protein I)].
      ), phospholipids(
      • Poon I.K.H.
      • Hulett M.D.
      • Parish C.R.
      Histidine-rich glycoprotein is a novel plasma pattern recognition molecule that recruits IgG to facilitate necrotic cell clearance via FcgammaRI on phagocytes.
      ), plasminogen(
      • Lijnen H.R.
      • Hoylaerts M.
      • Collen D.
      Isolation and characterization of a human plasma protein with affinity for the lysine binding sites in plasminogen. Role in the regulation of fibrinolysis and identification as histidine-rich glycoprotein.
      ), fibrinogen(
      • Leung L.L.
      Interaction of histidine-rich glycoprotein with fibrinogen and fibrin.
      ), heme(
      • Katagiri M.
      • Tsutsui K.
      • Yamano T.
      • Shimonishi Y.
      • Ishibashi F.
      Interaction of heme with a synthetic peptide mimicking the putative heme-binding site of histidine-rich glycoprotein.
      ), Zn2+ (
      • Morgan W.T.
      Interactions of the histidine-rich glycoprotein of serum with metals.
      ) and even more proteins and co-factors. This suggests HRG functions as an adaptor molecule that plays a role in numerous important biological processes, such as antiangiogenic activity(
      • Thulin A.
      • Ringvall M.
      • Dimberg A.
      • Kårehed K.
      • Väisänen T.
      • Väisänen M.-R.
      • Hamad O.
      • Wang J.
      • Bjerkvig R.
      • Nilsson B.
      • Pihlajaniemi T.
      • Akerud H.
      • Pietras K.
      • Jahnen-Dechent W.
      • Siegbahn A.
      • Olsson A.-K.
      Activated platelets provide a functional microenvironment for the antiangiogenic fragment of histidine-rich glycoprotein.
      ,
      • Olsson A.-K.
      • Larsson H.
      • Dixelius J.
      • Johansson I.
      • Lee C.
      • Oellig C.
      • Björk I.
      • Claesson-Welsh L.
      A fragment of histidine-rich glycoprotein is a potent inhibitor of tumor vascularization.
      ,
      • Kärrlander M.
      • Lindberg N.
      • Olofsson T.
      • Kastemar M.
      • Olsson A.-K.
      • Uhrbom L.
      Histidine-Rich Glycoprotein Can Prevent Development of Mouse Experimental Glioblastoma.
      ) , immune complex clearance(
      • Gorgani N.N.
      • Parish C.R.
      • Easterbrook Smith S.B.
      • Altin J.G.
      Histidine-rich glycoprotein binds to human IgG and C1q and inhibits the formation of insoluble immune complexes.
      ), pathogen clearance(
      • Poon I.K.H.
      • Hulett M.D.
      • Parish C.R.
      Histidine-rich glycoprotein is a novel plasma pattern recognition molecule that recruits IgG to facilitate necrotic cell clearance via FcgammaRI on phagocytes.
      ,
      • Poon I.K.H.
      • Parish C.R.
      • Hulett M.D.
      Histidine-rich glycoprotein functions cooperatively with cell surface heparan sulfate on phagocytes to promote necrotic cell uptake.
      ,
      • Rydengård V.
      • Shannon O.
      • Lundqvist K.
      • Kacprzyk L.
      • Chalupka A.
      • Olsson A.-K.
      • Mörgelin M.
      • Jahnen-Dechent W.
      • Malmsten M.
      • Schmidtchen A.
      Histidine-rich glycoprotein protects from systemic Candida infection.
      ,
      • Gorgani N.N.
      • Smith B.A.
      • Kono D.H.
      • Theofilopoulos A.N.
      (2002) Histidine-rich glycoprotein binds to DNA and Fc gamma RI and potentiates the ingestion of apoptotic cells by macrophages.
      ,
      • Jones A.L.
      • Poon I.K.H.
      • Hulett M.D.
      • Parish C.R.
      Histidine-rich glycoprotein specifically binds to necrotic cells via its amino-terminal domain and facilitates necrotic cell phagocytosis.
      ), coagulation(
      • Tsuchida-Straeten N.
      • Ensslen S.
      • Schäfer C.
      • Wöltje M.
      • Denecke B.
      • Moser M.
      • Gräber S.
      • Wakabayashi S.
      • Koide T.
      • Jahnen-Dechent W.
      Enhanced blood coagulation and fibrinolysis in mice lacking histidine-rich glycoprotein (HRG).
      ), and fibrinolysis(
      • Wakabayashi S.
      • Koide T.
      Histidine-rich glycoprotein: a possible modulator of coagulation and fibrinolysis.
      ).
      In humans, HRG is a highly polymorphic protein. Among the human population quite a few mutations in HRG are known, of which some are highly frequent. These allele frequencies are somewhat different in the populations from different parts of the world (Figure 1c). Therefore here, unless stated otherwise, we use the reported average(
      • Sherry S.T.
      • Ward M.H.
      • Kholodov M.
      • Baker J.
      • Phan L.
      • Smigielski E.M.
      • Sirotkin K.
      dbSNP: the NCBI database of genetic variation.
      ). There are 5 mutation sites in HRG with minor allele frequencies (MAF) of more than 10% (Figure 1d). A single mutation site can lead to 3 variants, with individuals being either homozygote for AA or aa, or heterozygote Aa/aA. With HRG harboring 5 frequently mutated sites, already 35 = 243 possible genetic variants could theoretically be present in the human population. These distinct mutations may affect the structure and function of HRG. For instance, the variant occurring at residue His340Arg substitution has been associated with divergent serum HRG levels(
      • Hennis B.C.
      • van Boheemen P.A.
      • Wakabayashi S.
      • Koide T.
      • Hoffmann J.J.
      • Kievit P.
      • Dooijewaard G.
      • Jansen J.G.
      • Kluft C.
      Identification and genetic analysis of a common molecular variant of histidine-rich glycoprotein with a difference of 2kD in apparent molecular weight.
      ). The MAF occurring at residue Pro204Ser substitution is also very high, but quite divergent between different populations, namely around 33% for Ser in Europeans but 63% in African populations (Figure 1c). This particular mutation has been associated with aging and also as a predictor for the risk of mortality(
      • Hong M.-G.
      • Dodig-Crnković T.
      • Chen X.
      • Drobin K.
      • Lee W.
      • Wang Y.
      • Edfors F.
      • Kotol D.
      • Thomas C.E.
      • Sjöberg R.
      • Odeberg J.
      • Hamsten A.
      • Silveira A.
      • Hall P.
      • Nilsson P.
      • Pawitan Y.
      • Uhlén M.
      • Pedersen N.L.
      • Hägg S.
      • Magnusson P.K.
      • Schwenk J.M.
      Profiles of histidine-rich glycoprotein associate with age and risk of all-cause mortality.
      ). Besides, the Pro204Ser substitution has also been associated with activated partial thromboplastin time (aPTT), which is associated with risk of thrombosis and coagulation disorders(
      • Brunel H.
      • Massanet R.
      • Martinez-Perez A.
      • Ziyatdinov A.
      • Martin-Fernandez L.
      • Souto J.C.
      • Perera A.
      • Soria J.M.
      The Central Role of KNG1 Gene as a Genetic Determinant of Coagulation Pathway-Related Traits: Exploring Metaphenotypes.
      ). Moreover, this Pro204Ser substitution is in proximity of Asn202, a site that may become N-glycosylated, but only when a Ser204 is present(
      • Clerc F.
      • Reiding K.R.
      • Jansen B.C.
      • Kammeijer G.S.M.
      • Bondt A.
      • Wuhrer M.
      Human plasma protein N-glycosylation.
      ).
      Here, we set out to purify HRG from about 50 μL of serum originating from 44 individual donors. Using shotgun proteomics, we were able to achieve good sequence coverage, including peptides that covered all the 5 distinct mutation sites. Through these measurements we were able to define the HRG mutations present at each of the 5 mutation sites for each donor, and whether they were either homozygote or heterozygote for that site. We used this data to investigate a potential relationship between all these 5 mutations. Investigating first a possible pairwise interplay between two mutation sites, we observed that the occurrence of some mutations at certain sites were mutually exclusive, whereas others apparently strengthened each other. To validate these findings, we next compared our proteomics data with data from the 1000 genome project and monitored the co-occurrence of these 5 mutations in genomes of n ∼ 2500 humans. The proteomics and genomics data were found to be in good agreement, therefore supporting the notion that the 5 mutation sites cannot be considered as independent events. We were also able to confirm that the Pro204Ser substitution leads, when Ser is present, to an additionally N-glycosylation site in HRG at Asn202. All these observations display the wealth of genotypes and glycoproteoforms HRG may possess in human serum. Therefore, we argue, when considering HRG as a putative serum protein biomarker(
      • Lee J.
      • Mun S.
      • Park A.
      • Kim D.
      • Lee Y.-J.
      • Kim H.-J.
      • Choi H.
      • Shin M.
      • Lee S.J.
      • Kim J.G.
      • Chun Y.T.
      • Kang H.-G.
      Proteomics Reveals Plasma Biomarkers for Ischemic Stroke Related to the Coagulation Cascade.
      ,
      • Liu Y.
      • He J.
      • Li C.
      • Benitez R.
      • Fu S.
      • Marrero J.
      • Lubman D.M.
      Identification and confirmation of biomarkers using an integrated platform for quantitative analysis of glycoproteins and their glycosylations.
      ,
      • Nishibori M.
      • Wake H.
      • Morimatsu H.
      Histidine-rich glycoprotein as an excellent biomarker for sepsis and beyond.
      ), this fine-tuning in its natural proteogenomics occurrence needs to be accounted for.

      Materials and Methods

      Study cohort

      The 44 serum samples were obtained from two prior-gathered cohorts from Amsterdam (NL) and Bologna (Italy). These serum samples were obtained in accordance with the ethics of the board of Sanquin (Amsterdam, The Netherlands) and Comitato Etico di Area Vasta Emilia Centro (Bologna, Italy) and following the Declaration of Helsinki principles. All donors had given their written informed consent. The serum samples from Italy were from a cohort which is registered at www.clinicaltrials.gov with the identifier NCT04343053. Whole blood from each individual donor was collected in a vacuette tube (Greiner Bio-One, Kremsmunster, Austria) containing Z serum clot activator. The whole blood was left undisturbed at room temperature for 30-60 min. before removal of the clotted material by centrifugation at 1800 x g for 20 min. Individual serum samples were transferred to 1.5 mL Eppendorf tubes as 1 mL aliquots, snap frozen in liquid nitrogen and stored at -80 °C.

      HRG protein purification

      HRG was isolated from human serum using immobilized metal affinity chromatography (IMAC) using a cobalt-loaded resin (Thermo Scientific, Waltham, America). Briefly, cobalt slurries were washed with 3× binding buffer (Supplementary Table 1). A volume of 50 μL of beads was incubated on a tube revolver with 100 μL serum diluted with 1000 μL binding buffer for 3 h at 4 °C. Then the beads were subsequently washed with washing buffer 1-7 (Supplementary Table 1). The purified HRG protein was eluted with eluting buffer twice (Supplementary Table 1). The supernatant was combined and vacuum dried. Imidazole, which was used for the buffer, was purchased from Sigma-Aldrich (Steinheim, Germany).

      In-solution proteolysis

      Purified HRG was resuspended in 100 mM Tris-HCl and then mixed with digestion buffer containing 200 mM Tris–HCl (pH 8.5), 2% w/v sodium deoxycholate (SDC), 10 mM tris(2-carboxyethyl)phosphine (TCEP) and 60 mM chloroacetamide (CAA). The sample was denatured at 95 °C for 10 min and then incubated in the dark for 45 min. The resulting peptide mixtures were further digested overnight at 37 °C by trypsin (1:30; w/w). The next day, SDC was removed by acid precipitation 0.5% trifluoroacetic acid (TFA). The peptides were desalted by using an Oasis PRiME HLB plate (Waters, Wexford, Ireland) then dried and stored at −80 °C. Tris-HCl, TCEP, CAA, SDC, TFA, and trypsin were purchased from Sigma-Aldrich (Steinheim, Germany).

      Bottom-up proteomics

      Shotgun LC–MS2 was performed by using an UltiMate 3000 HPLC system (Thermo Fisher Scientific, Bremen, Germany) coupled to a Orbitrap Exploris Mass Spectrometer (Thermo Fisher Scientific, Bremen, Germany). About 250 ng of the digested peptides were loaded onto a 2 cm trap column (in-house packed with ReproSil-Pur C18-AQ, 3 μm) (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) coupled to a 50 μm inner diameter 50 cm analytical column (in-house packed with Poroshell 120 EC-C18, 2.7 μm) (Agilent Technologies, Amstelveen, The Netherlands). As for gradient separation, 0.1% formic acid (v/v) was used as the mobile phase A, while 0.1% formic acid in acetonitrile (v/v) was mobile phase B. At the first stage, the mobile phase increased from 9% B to 13% for 1 min, from 13% to 44% in the next 40 min, from 44% to 99% for 3 min, after which it was maintained at 99% for 4 min. Afterwards, B decreased to 9% in 1 min and was maintained at 9% for 10 min. The flow rate was set as 300 nL/min. Peptides were ionized using a spray voltage of 2 kV in combination with an ion transfer capillary that was heated to 275 °C. The mass spectrometer was set to acquire full-scan MS spectra (m/z 350–2000) for a maximum injection time of 50 ms at a mass resolution of 120,000. Up to 15 of the most intense precursor ions were selected for tandem mass spectrometry (MS2). Higher-energy collisional dissociation (HCD) MS2 (m/z 120–4000) acquisition was performed in the HCD cell, with the readout in the Orbitrap mass analyzer at a resolution of 60,000 and a maximum injection time of 50 ms with a normalized collision energy of 29%. For the product-dependent-stepping-HCD fragmentation, the collision energy was 10%, 25% and 40%.

      Proteomics database search and data analysis

      As we expected both non-modified as well as glycopeptides, we analyzed the raw files using Byonic v4.3.4 (PMi). MS/MS spectra were searched against the full annotated human proteome (Swiss-Prot, release date July 2021, 20,398 entries), complemented with all five dominant mutations of HRG. The search parameters were: fixed modification of cysteine residues (+57 Da), variable modification of methionine oxidation (+16 Da) and full trypsin cleavage, with at most 6 missed tryptic cleavages, 10 ppm error tolerance in MS and 20 ppm error tolerance in MS/MS. False discovery rates were < 1%. When tryptic peptides were detected covering mutation sites, Skyline (v3.7.0.11317) was used to perform a relative quantification. Doing this, we integrated each variant of the peptides that was reported by Byonic, which included all major miscleavages and oxidation variants. Integrations obtained as such were subsequently curated to adhere to the following criteria: 1) ≤ 5 ppm error to the theoretical mass, 2) having an idotp of ≥ 80% with the theoretical isotopic pattern, 3) eluting within ±3.5 min of the mean retention time for that peptide, and 4) having no apparent overlapping isotopic patterns.

      Thousand genome analysis

      To compare our observed proteomics data on the different alleles and combinations of alleles we obtained the 1000 genome data (1000 genome database, release date October 2015)(
      A global reference for human genetic variation.
      ), including sample ID, genotype, genotype frequency, population(s) and allele counts, from ENSEMBL database(
      • Cunningham F.
      • Allen J.E.
      • Allen J.
      • Alvarez-Jarreta J.
      • Amode M.R.
      • Armean I.M.
      • Austine-Orimoloye O.
      • Azov A.G.
      • Barnes I.
      • Bennett R.
      • Berry A.
      • Bhai J.
      • Bignell A.
      • Billis K.
      • Boddu S.
      • Brooks L.
      • Charkhchi M.
      • Cummins C.
      • Da Rin Fioretto L.
      • Davidson C.
      • Dodiya K.
      • Donaldson S.
      • El Houdaigui B.
      • El Naboulsi T.
      • Fatima R.
      • Giron C.G.
      • Genez T.
      • Martinez J.G.
      • Guijarro-Clarke C.
      • Gymer A.
      • Hardy M.
      • Hollis Z.
      • Hourlier T.
      • Hunt T.
      • Juettemann T.
      • Kaikala V.
      • Kay M.
      • Lavidas I.
      • Le T.
      • Lemos D.
      • Marugán J.C.
      • Mohanan S.
      • Mushtaq A.
      • Naven M.
      • Ogeh D.N.
      • Parker A.
      • Parton A.
      • Perry M.
      • Piližota I.
      • Prosovetskaia I.
      • Sakthivel M.P.
      • Salam A.I.A.
      • Schmitt B.M.
      • Schuilenburg H.
      • Sheppard D.
      • Pérez-Silva J.G.
      • Stark W.
      • Steed E.
      • Sutinen K.
      • Sukumaran R.
      • Sumathipala D.
      • Suner M.-M.
      • Szpak M.
      • Thormann A.
      • Tricomi F.F.
      • Urbina-Gómez D.
      • Veidenberg A.
      • Walsh T.A.
      • Walts B.
      • Willhoft N.
      • Winterbottom A.
      • Wass E.
      • Chakiachvili M.
      • Flint B.
      • Frankish A.
      • Giorgetti S.
      • Haggerty L.
      • Hunt S.E.
      • IIsley G.R.
      • Loveland J.E.
      • Martin F.J.
      • Moore B.
      • Mudge J.M.
      • Muffato M.
      • Perry E.
      • Ruffier M.
      • Tate J.
      • Thybert D.
      • Trevanion S.J.
      • Dyer S.
      • Harrison P.W.
      • Howe K.L.
      • Yates A.D.
      • Zerbino D.R.
      • Flicek P.
      Ensembl 2022.
      ) for each variant. All variants were combined per sample ID to obtain a comprehensive overview of all allele combinations per individual. Subsequently, we counted the observed allele combinations and from this calculated their combined frequencies.

      Phylogenetic context

      For each allele we looked at the phylogenetic context, using the - 90 eutherian mammals EPO-Extended – dataset from ENSEMBL. From these alignments we determined which allele is observed for which species, (e.g., primates or non-primates) and where it starts to differ, if at all, from the human reference allele. This information was subsequently used to obtain an indication on how well this allele is conserved and if it was mutated in the relative early or far past.(
      • Cunningham F.
      • Allen J.E.
      • Allen J.
      • Alvarez-Jarreta J.
      • Amode M.R.
      • Armean I.M.
      • Austine-Orimoloye O.
      • Azov A.G.
      • Barnes I.
      • Bennett R.
      • Berry A.
      • Bhai J.
      • Bignell A.
      • Billis K.
      • Boddu S.
      • Brooks L.
      • Charkhchi M.
      • Cummins C.
      • Da Rin Fioretto L.
      • Davidson C.
      • Dodiya K.
      • Donaldson S.
      • El Houdaigui B.
      • El Naboulsi T.
      • Fatima R.
      • Giron C.G.
      • Genez T.
      • Martinez J.G.
      • Guijarro-Clarke C.
      • Gymer A.
      • Hardy M.
      • Hollis Z.
      • Hourlier T.
      • Hunt T.
      • Juettemann T.
      • Kaikala V.
      • Kay M.
      • Lavidas I.
      • Le T.
      • Lemos D.
      • Marugán J.C.
      • Mohanan S.
      • Mushtaq A.
      • Naven M.
      • Ogeh D.N.
      • Parker A.
      • Parton A.
      • Perry M.
      • Piližota I.
      • Prosovetskaia I.
      • Sakthivel M.P.
      • Salam A.I.A.
      • Schmitt B.M.
      • Schuilenburg H.
      • Sheppard D.
      • Pérez-Silva J.G.
      • Stark W.
      • Steed E.
      • Sutinen K.
      • Sukumaran R.
      • Sumathipala D.
      • Suner M.-M.
      • Szpak M.
      • Thormann A.
      • Tricomi F.F.
      • Urbina-Gómez D.
      • Veidenberg A.
      • Walsh T.A.
      • Walts B.
      • Willhoft N.
      • Winterbottom A.
      • Wass E.
      • Chakiachvili M.
      • Flint B.
      • Frankish A.
      • Giorgetti S.
      • Haggerty L.
      • Hunt S.E.
      • IIsley G.R.
      • Loveland J.E.
      • Martin F.J.
      • Moore B.
      • Mudge J.M.
      • Muffato M.
      • Perry E.
      • Ruffier M.
      • Tate J.
      • Thybert D.
      • Trevanion S.J.
      • Dyer S.
      • Harrison P.W.
      • Howe K.L.
      • Yates A.D.
      • Zerbino D.R.
      • Flicek P.
      Ensembl 2022.
      )

      Polyacrylamide gel electrophoresis

      A total of 15 μL sample was mixed with loading buffer for protein analysis. The sample was then heated to 95 °C for 5 minutes. The samples and SDS-PAGE standards (Biorad, California, America) were loaded into SDS-PAGE gel wells. Electrophoresis was carried out at 160 mA for 1 to 2 hours until the standards properly separated. The gel was then removed and put in the appropriate volume Coomassie G-250 stain solution (Biorad, California, America), ensuring that the gel was fully covered by staining solution and placed on a horizontal shaker. The staining solution was poured out, and the gel was washed with water for 4-24 hours. Water was replaced 3-5 times during the period until the blue background was almost removed, enhancing the dyeing effect on the protein bands.

      Results and Discussion

      Characterization of allelic frequencies in serum proteomics data

      For our proteomics analyses HRG was purified from 44 individual serum samples, obtained from donors, with consent, originating from different earlier gathered cohorts, with all donors being from European origin (i.e., Netherlands and Italy, see Material and Methods for more details). Making use of its unique histidine-rich domain, HRG could be isolated from serum by immobilized metal affinity chromatography (IMAC) using a cobalt-loaded resin (see Material and Methods). After purification of HRG from individual serum samples, we digested HRG with trypsin. In all the 44 serum samples, we carefully inspected the shotgun proteomics data for the appearance of tryptic peptides covering each of the individual 5 mutations. Both Byonic and Skyline were used to analyze the proteomics data, to provide qualitative and quantitative insight into the presence and relative abundance of the 5 distinct allele variants in HRG purified from each of the donors. We obtained a typical sequence coverage of around ∼75%, and most of the time we detected tryptic peptides covering each of the 5 mutation sites.
      Since humans are diploid organisms, individuals can be homozygous or heterozygous for genetic variations. Therefore, in heterozygote donors we would expect distinct peptides harboring either the dominant or alternative mutant. In Figure 2, illustrative prototypical LC-MS traces are depicted of allele-specific peptides of HRG, obtained from homozygote or heterozygote donors carrying either the Arg448Cys substitution (Figure 2a) or the Asn493Ile substitution (Figure 2b). The whole dataset of alike LC-MS traces of peptides, covering all mutations for all donors, are provided in Supplementary Figure 1-10. Additionally, MS/MS fragmentation spectra, explicitly assigning the different mutations, are provided in Supplementary Figure 11. Nearly, for each mutation our cohort represented donors being either AA or aa homozygote or Aa/aA heterozygote. In case of homozygotes, we counted the allele occurrence twice (for each of the two genes), which allowed us to calculate the frequency of occurrence of each mutation in our cohort of 44 donors. A summary of all data for each mutation in HRG in our cohort of 44 people is provided in Supplementary Data Excel file 1.
      Figure thumbnail gr2
      Figure 2Gene variants of HRG measured by proteomics. Prototypical LC-MS traces of unique allele specific peptides detected in serum HRG purified from different donors (Skyline was used for allele classification and quantification) of a) the Arg448Cys substitution and b) and the Asn493Ile substitution. The spectra from top to bottom are the LC-MS traces for donors being homozygote AA, heterozygote Aa/aA or homozygote aa. Dark grey: Arg448, light grey: Cys448, dark orange: Ile493, light orange: Asn493. c) The frequencies of gene variants of HRG with MAF of more than 10% as observed in the proteomics data from the 44 donors (left) and (for comparison) as extracted from the European genomics data from the 1000 genomes project (right).
      In more detail, in our n=44 cohort proteomics dataset we observed a ratio of 88%:12% for the Ile180Thr substitution rs10770 (T>C) , a ratio of 53%:47% for the Pro204Ser substitution rs9898 (C>T) , a ratio of 60%:40% for the His340Arg substitution, rs2228243 (A>G) , a ratio of 72%:28% for the Arg448Cys substitution rs1042445 (C>T) with Arg>Cys, and for Asn493Ile substitution, rs1042464 (A>T) a ratio of 41%:59%, with Ile dominant (Figure 2c). Although our (fully European) cohort of 44 is relatively small, these ratios are relatively close to the expected MAFs within the European cohort based on data from the 1000 genome project (Figure 2c). Based on this agreement we argue that the proteomics-based measurement of allelic frequencies occurring in HRG purified from serum of individual donors by IMAC-based chromatography is feasible.

      Pairwise co-occurrences of allele specific mutations in the proteomics data

      Although our cohort is rather small (n=44), several striking features could be observed in the proteomics data, especially in the co-occurrence of specific mutations. We first focused on pairwise co-occurrences. For instance, donors found at the peptide level to be homozygote in Pro204 (i.e., no peptides were detected with a Ser204), also never presented an HRG peptide harboring an Ile on position 493, but exclusively always an Asn493. From this, one may infer that all the donors being homozygote at Asn493 should also be homozygote at Pro204. Similarly, in HRG of donors wherein we exclusively detected peptides with an Asn493 (and not Ile493), we also only detected allele-specific peptides carrying an Arg448 (and not Cys448). From this, one may argue that all the homozygote Asn493donors should also be homozygote in Arg448. To understand these pairwise correlations, we next accumulated from the proteomics data all pairwise co-occurrences of mutations between position 493 and other positions (Figure 3a). We also calculated the expected pairwise co-occurrences when all mutations would be considered as independent events (based on the European single MAFs) (Figure 3b). From the data depicted in Figure 3a and 3b it becomes apparent that most pairwise combinations do not fulfill the expectancy based on independent assembly, with some pairwise combinations being highly favored and others being absent, even when these should be present based on the independent assembly model. Of note, not in all 44 HRG samples we were able to obtain data on each mutation site with sufficient evidence to extract information on the allotype, either due to low intensities or mismatches with expected isotope ratios (Supplementary Figure 2, 5, 6, 9, 10). Therefore, just 32 samples met our stringent filtering conditions and were used for the data analysis. Table 1 illustrates the pairwise combinations that should be observed, considering both homo- and heterozygote occurrences. If both mutations would be independent, the frequencies of the pairwise combinations can be calculated by multiplying the MAFs of the single mutants.
      Figure thumbnail gr3
      Figure 3Pairwise co-occurrences of allele specific mutations in HRG between position 493 and others mutation sites. a) Frequencies observed in the proteomics data (n=32). b) Theoretical frequencies of co-occurrences based on independent assembly. c) Frequencies observed in the European subset from the 1000 genomes project.
      Table 1Co-occurrence analysis of mutations. When considering two mutations, there can be nine pairwise combinations, where A represents the first allele and B represents the second allele.
      BBBbbb
      AAAA × BBAA × BbAA × bb
      AaAa × BBAa × BbAa × bb
      aaaa × BBaa × Bbaa × bb
      Clearly, the proteomics data reveal that of the 5 mutation sites in HRG, some do show a strong correlation in co-occurrence. However, this finding is based on a relatively small number of donors. Moreover, shotgun proteomics analysis remains a stochastic process, and thus certain peptides may not be consistently detected. On the other hand, since we do observe unique highly abundant peptides of each single mutation occurring in the 5 mutation sites in multiple samples, we believe that when these peptides are not detected in samples from other donors, they are either very low abundant or not present.
      To further examine the findings that pairwise mutations do not assemble independently, we charted the data encompassed in the 1000-genomes project, which is actually an accumulation of 2504 fully sequenced human genomes, and compared in a pairwise fashion how often mutations in HRG co-occurred (
      A global reference for human genetic variation.
      ,
      • Howe K.L.
      • Achuthan P.
      • Allen J.
      • Allen J.
      • Alvarez-Jarreta J.
      • Amode M.R.
      • Armean I.M.
      • Azov A.G.
      • Bennett R.
      • Bhai J.
      • Billis K.
      • Boddu S.
      • Charkhchi M.
      • Cummins C.
      • Da Rin Fioretto L.
      • Davidson C.
      • Dodiya K.
      • El Houdaigui B.
      • Fatima R.
      • Gall A.
      • Garcia Giron C.
      • Grego T.
      • Guijarro-Clarke C.
      • Haggerty L.
      • Hemrom A.
      • Hourlier T.
      • Izuogu O.G.
      • Juettemann T.
      • Kaikala V.
      • Kay M.
      • Lavidas I.
      • Le T.
      • Lemos D.
      • Gonzalez Martinez J.
      • Marugán J.C.
      • Maurel T.
      • McMahon A.C.
      • Mohanan S.
      • Moore B.
      • Muffato M.
      • Oheh D.N.
      • Paraschas D.
      • Parker A.
      • Parton A.
      • Prosovetskaia I.
      • Sakthivel M.P.
      • Salam A.I.A.
      • Schmitt B.M.
      • Schuilenburg H.
      • Sheppard D.
      • Steed E.
      • Szpak M.
      • Szuba M.
      • Taylor K.
      • Thormann A.
      • Threadgold G.
      • Walts B.
      • Winterbottom A.
      • Chakiachvili M.
      • Chaubal A.
      • De Silva N.
      • Flint B.
      • Frankish A.
      • Hunt S.E.
      • IIsley G.R.
      • Langridge N.
      • Loveland J.E.
      • Martin F.J.
      • Mudge J.M.
      • Morales J.
      • Perry E.
      • Ruffier M.
      • Tate J.
      • Thybert D.
      • Trevanion S.J.
      • Cunningham F.
      • Yates A.D.
      • Zerbino D.R.
      • Flicek P.
      Ensembl 2021.
      ). We used the reported frequencies of the European cohort within the 1000 genome project (n= 503, phase 3), whereby these frequencies were obtained from the ENSEMBL database(
      • Howe K.L.
      • Achuthan P.
      • Allen J.
      • Allen J.
      • Alvarez-Jarreta J.
      • Amode M.R.
      • Armean I.M.
      • Azov A.G.
      • Bennett R.
      • Bhai J.
      • Billis K.
      • Boddu S.
      • Charkhchi M.
      • Cummins C.
      • Da Rin Fioretto L.
      • Davidson C.
      • Dodiya K.
      • El Houdaigui B.
      • Fatima R.
      • Gall A.
      • Garcia Giron C.
      • Grego T.
      • Guijarro-Clarke C.
      • Haggerty L.
      • Hemrom A.
      • Hourlier T.
      • Izuogu O.G.
      • Juettemann T.
      • Kaikala V.
      • Kay M.
      • Lavidas I.
      • Le T.
      • Lemos D.
      • Gonzalez Martinez J.
      • Marugán J.C.
      • Maurel T.
      • McMahon A.C.
      • Mohanan S.
      • Moore B.
      • Muffato M.
      • Oheh D.N.
      • Paraschas D.
      • Parker A.
      • Parton A.
      • Prosovetskaia I.
      • Sakthivel M.P.
      • Salam A.I.A.
      • Schmitt B.M.
      • Schuilenburg H.
      • Sheppard D.
      • Steed E.
      • Szpak M.
      • Szuba M.
      • Taylor K.
      • Thormann A.
      • Threadgold G.
      • Walts B.
      • Winterbottom A.
      • Chakiachvili M.
      • Chaubal A.
      • De Silva N.
      • Flint B.
      • Frankish A.
      • Hunt S.E.
      • IIsley G.R.
      • Langridge N.
      • Loveland J.E.
      • Martin F.J.
      • Mudge J.M.
      • Morales J.
      • Perry E.
      • Ruffier M.
      • Tate J.
      • Thybert D.
      • Trevanion S.J.
      • Cunningham F.
      • Yates A.D.
      • Zerbino D.R.
      • Flicek P.
      Ensembl 2021.
      ). These extracted data are provided in Supplementary Data Excel file 2. The result of this pairwise analysis is displayed in Figure 3c. At first glance the agreement between the proteomics data on the small cohort (n=32) and the 1000 genome data, from the 503 (European) full genomes is very good, especially when compared with the data produced from the independent assembly model. Several pairwise mutations predicted to occur rather frequently based on the independent assembly model, for instance homozygote Ser204 with homozygote at Asn493, are completely missing in both the proteomics and genomics data. Thus, from the experimental proteomics and genomics data presented in Figure 3, it becomes clear that in HRG pairwise combinations of the 5 mutations do not occur by independent assembly of single site mutations, with some pairwise combinations highly favored while others seem not to be present.
      Although the proteomics data and genomics data show overall great similarity, we also observed evidence for a few seemingly co-occurring mutations in the proteomics data, that were absent in the genomics data. For instance, we do find two donors that are heterozygote in His340Arg and homozygote in Asn493Asn, as well as two donors that are homozygote in Arg340Arg and heterozygote in Asn493Ile combinations in our proteomics data (Figure 3a), which are not observed in the 1000 genomes dataset (Figure 3c). Presently, we cannot explain this apparent discrepancy but do provide the apparently robust evidence of these co-occurring mutations in Supplementary Figure 12-14.

      Total number of HRG genetic variants present in the human population

      Next, we extended our analysis beyond these pairwise combinations of mutations, calculating the frequency of occurrence of combinations of all 5 mutations in the human population. To reiterate, a single mutation site can lead to 3 variants, with donors being either homozygote for AA or aa, or heterozygote Aa/aA. With HRG harboring 5 mutation sites, 35 = 243 genetic variants of HRG could theoretically be present in the human population. We calculated, similar as we did in the pairwise analyses, the expected frequency of occurrence of all these 243 possible combinations when these 5 mutations would assemble independently from each other. Additionally, from the full 1000 genome dataset (n=2504), we counted the frequency of occurrences of each of these 243 combinations. The outcome of this analyses is depicted in Figure 4a. Interestingly, only about 50 combinations (out of 243) dominate the frequency-of-occurrence spectrum in the genomics data, several of which seem to be extremely enriched when compared to what would be expected based on the independent assembly model. Therefore, also from this analysis it becomes clear that the independent assembly model delivers substantial discrepancies when compared to the experimental genomics and proteomics data.
      Figure thumbnail gr4
      Figure 4Theoretical (independent assembly) and experimental (1000 genome data) co-occurrences of all 5 allele frequencies in HRG. a) Occurrence % of the top 20 abundant combinations based on an independent assembly model (left) and the experimental 1000 genome data (right). The y-axis depicts the combinations of most frequent gene variants within HRG, with pairwise the amino acids at position 180, 204, 340, 448 and 493. Cartoon figures with different colors represent these different allele combinations. Each figure represents 1% of the population. The pie-chart insets depict the summed relative contribution of the 20 most frequently calculated/observed combinations. b) Proposed evolutionary tree of the 5 mutations in HRG, enlightening why certain combinations of co-occurrences are unlikely to be observed. Each branch represents the most abundant allele combination of HRG s in different regions.
      To further highlight these differences, we also depict the frequency-of-occurrence of the top 20 most abundant allelic combinations of the 5 mutation sites both considering the independent assembly model (Figure 4a, left pie chart) as the experimental human genome data (Figure 4a, right pie chart). They only share nine of these two top-20 combinations of the 5 mutation sites. In frequency the top 20 makes up ∼61% of all occurrences according to the independent assembly model, while the top 20 in the 1000 genome data makes up ∼96% of all occurrences. Thus, out of 35 = 243 combinations only a few dozen dominate the palette of allelic occurrences.
      The allelic occurrence of the fully homozygote II_PP_HH_RR_NN combination is the most abundant in the 1000 genome data, present in ∼14% of the tested human population. This is a very strong enrichment when compared to the independent assembly model, which predicts that less than 1% of the population should carry this combination. Additionally, many combinations predicted to be frequent based on the independent assembly model do not occur in the 1000 genome data set. For instance, and as already described in the pairwise comparison section, all allelic combinations that comprise a homozygote Ser204 with a homozygote Asn493 are completely missing in the 1000 genome data.
      To investigate whether this is a unique phenomenon for HRG, we performed a similar analysis on the human serum protein alpha-1-antitrypsin (A1AT) that also harbors several well-known mutations(
      • Jager S.
      • Cramer D.A.T.
      • Hoek M.
      • Mokiem N.J.
      • van Keulen B.J.
      • van Goudoever J.B.
      • Dingess K.A.
      • Heck A.J.R.
      Proteoform Profiles Reveal That Alpha-1-Antitrypsin in Human Serum and Milk Is Derived From a Common Source.
      ). However, for A1AT we found that the distribution of allelic variants were much more similar between the prediction based on the independent assembly model and the experimental 1000 genome data (n=2504) (
      • Jager S.
      • Cramer D.A.T.
      • Hoek M.
      • Mokiem N.J.
      • van Keulen B.J.
      • van Goudoever J.B.
      • Dingess K.A.
      • Heck A.J.R.
      Proteoform Profiles Reveal That Alpha-1-Antitrypsin in Human Serum and Milk Is Derived From a Common Source.
      ). All data related to the analyses on the allelic frequencies of HRG and A1AT are provided in Supplementary Data Excel file 3 & 4 respectively and Supplementary Figure 15. In addition, we also compared the European allelic variant distribution from HRG and A1AT between the independent assembly model and the (smaller) European sub-cohort from the 1000 genome project (n=503), provided in Supplementary Data Excel file 5 & 6 and Supplementary Figure 16 & 17, respectively. Also, by using this smaller sub-cohort, we did observe similar discrepancies between the independent assembly model and the genome data, again much more so for HRG than for A1AT. Currently, we lack a good explanation for these distinct proteogenomic features for human HRG and A1AT.

      Enriched observed HRG genotypes are consequences of evolutionary divergence

      We attempted to address the question why some of the 243 possible HRG genotypes are enriched, diminished or not present at all. We first queried whether some of these 5 mutations might have originated late in evolution and in distinct parts of the world. Such a hypothesis may be tested from the data summarized in Figure 1c. For instance, compared to the global average, Ser204 is much more prevalent in the African and east Asian population, whereas Pro204 is most dominant in the European and American populations. Also, nearly all Africans carry exclusively His340 (∼97%), whereas especially in the east Asian population this site has a high frequency of occurrence (41%) of Arg340. Thirdly, Europeans have relatively high frequencies of Asn493 (52%) compared to some other populations (e.g., 19% in east Asians). This analysis shows that the distributions of variants of the HRG is highly affected by the origin of the cohort, which also implies that when suggesting/testing HRG as a protein biomarker in serum proteomics studies, the origin of the cohort needs to be accounted for.
      Next, we aimed to define the “ancestral” variants of HRG defined by genomics data on HRG of non-human primates as well as non-primates.(
      • Chang K.T.
      • Guo J.
      • di Ronza A.
      • Sardiello M.
      Aminode: Identification of Evolutionary Constraints in the Human Proteome.
      ). To start with the latter, when performing a BLAST search of human HRG against non-human primates, we found that in all these sequences no variants were observed for 4 of the 5 mutations we discussed here. In more detail, non-human primates have all highly conserved Ile180, Ser204, His340 and Ile493 (Figure 1d). Only for the Arg448His substitution in HRG, several non-human primates carried either Arg or His. It seems therefore that most of the mutations in human HRG discussed here have occurred very late in evolution. Ile180 as well as His340 are conserved in mammals (Figure 4b). After the beginning of the mutation on position 448, Arg448 became dominant in primates, while Cys448 is dominant in non-primates. This particular mutation is interesting as Cys448 could potentially be involved in a disulfide bridge or provide HRG with a free Cys. Our shotgun proteomics data unfortunately does not enable us to provide evidence for either case.
      The HRG variant Pro204 mainly occurred in the European, American and South Asian populations, while Ser204 is largely conserved in the African and east Asian populations. Of the 5 mutations considered here, the latest mutation occurred at position 493. Ile493 has become the major allele in all populations except for the Europeans (Figure 4b). This may also hint at a wider diversity of HRG genes and proteins in humans than in non-human primates. From this analysis it appears that most of the mutations in human HRG discussed here originated only recently and only in the human population.
      Initially, when we had only considered the proteomics data, we also explored an additional rationale addressing why some of the allelic variants are potentially never observed as HRG proteins in blood, namely that it could be the case that a certain allelic variation at the gene level would not lead to a viable expression of the protein. To test this hypothesis, we recombinantly expressed in human HEK293 cells four variants of HRG testing all possible pairwise mutations for position 204 and 493. We accordingly expressed HRG with PP204_II493, PP204_NN493, SS204_II493, and SS204_NN493. According to the proteomics and 1000 genome data SS204_NN493 HRG does not occur in nature. The expression of all these variants in the HEK293 cells led in all cases to high, and more importantly, equal levels of HRG demonstrating that the non-natural variants are equally viable (Supplementary Figure 18).

      Functional consequences of specific HRG genotypes

      Our proteogenomics analysis demonstrates that when analyzing HRG in serum proteomics studies, there is more to this than meets the eye when just considering protein levels without taking the highly polymorphic nature of HRG into account. Evidently several genome-wide association studies (GWAS) have taken these HRG mutations into account, and from such studies the Pro204Ser substitution, for instance, has been proposed to be associated with aging and also as a predictor for the risk of mortality following bacterial infections(
      • Hong M.-G.
      • Dodig-Crnković T.
      • Chen X.
      • Drobin K.
      • Lee W.
      • Wang Y.
      • Edfors F.
      • Kotol D.
      • Thomas C.E.
      • Sjöberg R.
      • Odeberg J.
      • Hamsten A.
      • Silveira A.
      • Hall P.
      • Nilsson P.
      • Pawitan Y.
      • Uhlén M.
      • Pedersen N.L.
      • Hägg S.
      • Magnusson P.K.
      • Schwenk J.M.
      Profiles of histidine-rich glycoprotein associate with age and risk of all-cause mortality.
      ) and increased risk of thrombosis and coagulation disorders(
      • Brunel H.
      • Massanet R.
      • Martinez-Perez A.
      • Ziyatdinov A.
      • Martin-Fernandez L.
      • Souto J.C.
      • Perera A.
      • Soria J.M.
      The Central Role of KNG1 Gene as a Genetic Determinant of Coagulation Pathway-Related Traits: Exploring Metaphenotypes.
      ). This Pro204Ser substitution is of particular interest, also at the protein level. Ser204 is downstream of Asn202, and it has been reported that this Asn202 site may become N-glycosylated, but only when a Ser204 is present(
      • Clerc F.
      • Reiding K.R.
      • Jansen B.C.
      • Kammeijer G.S.M.
      • Bondt A.
      • Wuhrer M.
      Human plasma protein N-glycosylation.
      ). In our analysis we could confirm this additional N-glycosylation in two separate experiments. First of all, following the recombinant expression of the four HRG variants we observed strikingly that the variants that contained the Ser204 eluted on the gel at higher apparent molecular weight than the two variants containing a Pro204 (Supplemental Figure 18). The difference in elution is likely induced by the additional N-glycosylation on Asn202 when Ser204 is present. Secondly, in our proteomics data we did observe experimental evidence for abundant and extensive glycosylation on Asn202, but only in the two recombinant samples that contained the Ser204 site. Considering that Ser204 is likely the ancestral mutation, it seems that the Pro204 mutation, likely first seen in the European population, led to a partial loss of N-glycosylation in humans when compared to non-human primates. It may therefore be of interest to compare, at the protein and functional level, HRG from humans with its variant obtained from non-human primates. Thus, the Pro204Ser substitution in HRG is genetically of interest as it not only shows a great discordance in frequency-of-occurrence among different populations, but also from a glycoproteomics point of view as the mutation leads to a loss of a N-glycosylation site that seems at least partially occupied when there is a Ser at position 204.

      Concluding remarks

      At the proteogenomics level, quite some intriguing variation in HRG exists across the human population. Growing evidence suggests that such mutations may not only affect the function of HRG, but possibly also its expression level as well as its serum abundance and half-life(
      • Hong M.-G.
      • Dodig-Crnković T.
      • Chen X.
      • Drobin K.
      • Lee W.
      • Wang Y.
      • Edfors F.
      • Kotol D.
      • Thomas C.E.
      • Sjöberg R.
      • Odeberg J.
      • Hamsten A.
      • Silveira A.
      • Hall P.
      • Nilsson P.
      • Pawitan Y.
      • Uhlén M.
      • Pedersen N.L.
      • Hägg S.
      • Magnusson P.K.
      • Schwenk J.M.
      Profiles of histidine-rich glycoprotein associate with age and risk of all-cause mortality.
      ). In addition, the highly polymorphic nature of HRG makes it more difficult to study it in vitro and in vivo, as, to extract the right conclusions about the protein’s role and function, it needs to be stated which variant was used. Especially in serum proteomics studies of large cohorts the high variability in human HRG needs to be accounted for: only 14% of individuals are homozygote for the protein sequence that can be considered as “wild type”, a number that will furthermore differ between study populations. Therefore, we conclude that abundance data should not be averaged out in large cohort studies, instead the nature of the allelic and proteoform variants should be accounted for.

      Data availability

      The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE(
      • Perez-Riverol Y.
      • Bai J.
      • Bandla C.
      • García-Seisdedos D.
      • Hewapathirana S.
      • Kamatchinathan S.
      • Kundu D.J.
      • Prakash A.
      • Frericks-Zipper A.
      • Eisenacher M.
      • Walzer M.
      • Wang S.
      • Brazma A.
      • Vizcaíno J.A.
      The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.
      ) partner repository with the dataset identifier PXD040914. This article contains supplemental data: Supplementary Figure 1-18, Supplementary Table 1, and Supplementary Data Excel file 1-6.
      Yang Zou: Experimental work, Methodology, Initial Analysis Writing- Original draft preparation. Bas van Breukelen: Genome Analysis, Advise. Matti Pronker: Structural Analysis and Structural Model. Karli Reiding: Supervision, Data Analysis, Investigation, Data Curation, Writing. Albert Heck: Conceptualization, Experimental Design, Funding Acquisition, Writing- Reviewing and Editing,

      Acknowledgments

      We acknowledge support from the Netherlands Organization for Scientific Research (NWO) funding the X-omics Road Map program (project 184.034.019).

      Supplementary data

      References

        • Poon I.K.H.
        • Patel K.K.
        • Davis D.S.
        • Parish C.R.
        • Hulett M.D.
        Histidine-rich glycoprotein: the Swiss Army knife of mammalian plasma.
        Blood. 2011; 117: 2093-2101
        • Lee C.
        • Bongcam-Rudloff E.
        • Sollner C.
        • Jahnen-Dechent W.
        • Claesson-Welsh L.
        Type 3 cystatins; fetuins, kininogen and histidine-rich glycoprotein.
        Front Biosci (Landmark Ed). 2009; 14: 2911-2922
        • Waterhouse A.
        • Bertoni M.
        • Bienert S.
        • Studer G.
        • Tauriello G.
        • Gumienny R.
        • Heer F.T.
        • de Beer T.A.P.
        • Rempfer C.
        • Bordoli L.
        • Lepore R.
        • Schwede T.
        SWISS-MODEL: homology modelling of protein structures and complexes.
        Nucleic Acids Research. 2018; 46: W296-W303
        • Kassaar O.
        • Schwarz-Linek U.
        • Blindauer C.A.
        • Stewart A.J.
        Plasma free fatty acid levels influence Zn(2+) -dependent histidine-rich glycoprotein-heparin interactions via an allosteric switch on serum albumin.
        J Thromb Haemost. 2015; 13: 101-110
        • Mori S.
        • Takahashi H.K.
        • Yamaoka K.
        • Okamoto M.
        • Nishibori M.
        High affinity binding of serum histidine-rich glycoprotein to nickel-nitrilotriacetic acid: the application to microquantification.
        Life Sci. 2003; 73: 93-102
        • Weyrauch A.K.
        • Jakob M.
        • Schierhorn A.
        • Klösgen R.B.
        • Hinderberger D.
        Purification of rabbit serum histidine-proline-rich glycoprotein via preparative gel electrophoresis and characterization of its glycosylation patterns.
        PLoS One. 2017; 12e0184968
        • Kassaar O.
        • McMahon S.A.
        • Thompson R.
        • Botting C.H.
        • Naismith J.H.
        • Stewart A.J.
        Crystal structure of histidine-rich glycoprotein N2 domain reveals redox activity at an interdomain disulfide bridge: implications for angiogenic regulation.
        Blood. 2014; 123: 1948-1955
        • Colwell M.
        • Ahmed N.
        • Butkowski R.
        Detection of histidine-rich glycoprotein and fibrinogen with nickel-enzyme conjugates: Purification of rabbit HRG.
        Anal Biochem. 2017; 525: 67-72
        • Patel K.K.
        • Poon I.K.H.
        • Talbo G.H.
        • Perugini M.A.
        • Taylor N.L.
        • Ralph T.J.
        • Hoogenraad N.J.
        • Hulett M.D.
        New method for purifying histidine-rich glycoprotein from human plasma redefines its functional properties.
        IUBMB Life. 2013; 65: 550-563
        • Heimburger N.
        • Haupt H.
        • Kranz T.
        • Baudner S.
        [Human serum proteins with high affinity to carboxymethylcellulose. II. Physico-chemical and immunological characterization of a histidine-rich 3,8S- 2 -glycoportein (CM-protein I)].
        Hoppe-Seyler’s Zeitschrift fur physiologische Chemie. 1972; 353: 1133-1140
        • Poon I.K.H.
        • Hulett M.D.
        • Parish C.R.
        Histidine-rich glycoprotein is a novel plasma pattern recognition molecule that recruits IgG to facilitate necrotic cell clearance via FcgammaRI on phagocytes.
        Blood. 2010; 115: 2473-2482
        • Lijnen H.R.
        • Hoylaerts M.
        • Collen D.
        Isolation and characterization of a human plasma protein with affinity for the lysine binding sites in plasminogen. Role in the regulation of fibrinolysis and identification as histidine-rich glycoprotein.
        The Journal of biological chemistry. 1980; 255: 10214-10222
        • Leung L.L.
        Interaction of histidine-rich glycoprotein with fibrinogen and fibrin.
        The Journal of clinical investigation. 1986; 77: 1305-1311
        • Katagiri M.
        • Tsutsui K.
        • Yamano T.
        • Shimonishi Y.
        • Ishibashi F.
        Interaction of heme with a synthetic peptide mimicking the putative heme-binding site of histidine-rich glycoprotein.
        Biochemical and biophysical research communications. 1987; 149: 1070-1076
        • Morgan W.T.
        Interactions of the histidine-rich glycoprotein of serum with metals.
        Biochemistry. 1981; 20: 1054-1061
        • Thulin A.
        • Ringvall M.
        • Dimberg A.
        • Kårehed K.
        • Väisänen T.
        • Väisänen M.-R.
        • Hamad O.
        • Wang J.
        • Bjerkvig R.
        • Nilsson B.
        • Pihlajaniemi T.
        • Akerud H.
        • Pietras K.
        • Jahnen-Dechent W.
        • Siegbahn A.
        • Olsson A.-K.
        Activated platelets provide a functional microenvironment for the antiangiogenic fragment of histidine-rich glycoprotein.
        Molecular cancer research : MCR. 2009; 7: 1792-1802
        • Olsson A.-K.
        • Larsson H.
        • Dixelius J.
        • Johansson I.
        • Lee C.
        • Oellig C.
        • Björk I.
        • Claesson-Welsh L.
        A fragment of histidine-rich glycoprotein is a potent inhibitor of tumor vascularization.
        Cancer Res. 2004; 64: 599-605
        • Kärrlander M.
        • Lindberg N.
        • Olofsson T.
        • Kastemar M.
        • Olsson A.-K.
        • Uhrbom L.
        Histidine-Rich Glycoprotein Can Prevent Development of Mouse Experimental Glioblastoma.
        PLOS ONE. 2009; 4e8536
        • Gorgani N.N.
        • Parish C.R.
        • Easterbrook Smith S.B.
        • Altin J.G.
        Histidine-rich glycoprotein binds to human IgG and C1q and inhibits the formation of insoluble immune complexes.
        Biochemistry. 1997; 36: 6653-6662
        • Poon I.K.H.
        • Parish C.R.
        • Hulett M.D.
        Histidine-rich glycoprotein functions cooperatively with cell surface heparan sulfate on phagocytes to promote necrotic cell uptake.
        Journal of leukocyte biology. 2010; 88: 559-569
        • Rydengård V.
        • Shannon O.
        • Lundqvist K.
        • Kacprzyk L.
        • Chalupka A.
        • Olsson A.-K.
        • Mörgelin M.
        • Jahnen-Dechent W.
        • Malmsten M.
        • Schmidtchen A.
        Histidine-rich glycoprotein protects from systemic Candida infection.
        PLoS pathogens. 2008; 4e1000116
        • Gorgani N.N.
        • Smith B.A.
        • Kono D.H.
        • Theofilopoulos A.N.
        (2002) Histidine-rich glycoprotein binds to DNA and Fc gamma RI and potentiates the ingestion of apoptotic cells by macrophages.
        Journal of immunology (Baltimore, Md. 1950; 169: 4745-4751
        • Jones A.L.
        • Poon I.K.H.
        • Hulett M.D.
        • Parish C.R.
        Histidine-rich glycoprotein specifically binds to necrotic cells via its amino-terminal domain and facilitates necrotic cell phagocytosis.
        The Journal of biological chemistry. 2005; 280: 35733-35741
        • Tsuchida-Straeten N.
        • Ensslen S.
        • Schäfer C.
        • Wöltje M.
        • Denecke B.
        • Moser M.
        • Gräber S.
        • Wakabayashi S.
        • Koide T.
        • Jahnen-Dechent W.
        Enhanced blood coagulation and fibrinolysis in mice lacking histidine-rich glycoprotein (HRG).
        Journal of Thrombosis and Haemostasis. 2005; 3: 865-872
        • Wakabayashi S.
        • Koide T.
        Histidine-rich glycoprotein: a possible modulator of coagulation and fibrinolysis.
        Seminars in thrombosis and hemostasis. 2011; 37: 389-394
        • Sherry S.T.
        • Ward M.H.
        • Kholodov M.
        • Baker J.
        • Phan L.
        • Smigielski E.M.
        • Sirotkin K.
        dbSNP: the NCBI database of genetic variation.
        Nucleic acids research. 2001; 29: 308-311
        • Hennis B.C.
        • van Boheemen P.A.
        • Wakabayashi S.
        • Koide T.
        • Hoffmann J.J.
        • Kievit P.
        • Dooijewaard G.
        • Jansen J.G.
        • Kluft C.
        Identification and genetic analysis of a common molecular variant of histidine-rich glycoprotein with a difference of 2kD in apparent molecular weight.
        Thrombosis and haemostasis. 1995; 74: 1491-1496
        • Hong M.-G.
        • Dodig-Crnković T.
        • Chen X.
        • Drobin K.
        • Lee W.
        • Wang Y.
        • Edfors F.
        • Kotol D.
        • Thomas C.E.
        • Sjöberg R.
        • Odeberg J.
        • Hamsten A.
        • Silveira A.
        • Hall P.
        • Nilsson P.
        • Pawitan Y.
        • Uhlén M.
        • Pedersen N.L.
        • Hägg S.
        • Magnusson P.K.
        • Schwenk J.M.
        Profiles of histidine-rich glycoprotein associate with age and risk of all-cause mortality.
        Life science alliance. 2020; 3
        • Brunel H.
        • Massanet R.
        • Martinez-Perez A.
        • Ziyatdinov A.
        • Martin-Fernandez L.
        • Souto J.C.
        • Perera A.
        • Soria J.M.
        The Central Role of KNG1 Gene as a Genetic Determinant of Coagulation Pathway-Related Traits: Exploring Metaphenotypes.
        PloS one. 2016; 11e0167187
        • Clerc F.
        • Reiding K.R.
        • Jansen B.C.
        • Kammeijer G.S.M.
        • Bondt A.
        • Wuhrer M.
        Human plasma protein N-glycosylation.
        Glycoconj J. 2016; 33: 309-343
        • Lee J.
        • Mun S.
        • Park A.
        • Kim D.
        • Lee Y.-J.
        • Kim H.-J.
        • Choi H.
        • Shin M.
        • Lee S.J.
        • Kim J.G.
        • Chun Y.T.
        • Kang H.-G.
        Proteomics Reveals Plasma Biomarkers for Ischemic Stroke Related to the Coagulation Cascade.
        J Mol Neurosci. 2020; 70: 1321-1331
        • Liu Y.
        • He J.
        • Li C.
        • Benitez R.
        • Fu S.
        • Marrero J.
        • Lubman D.M.
        Identification and confirmation of biomarkers using an integrated platform for quantitative analysis of glycoproteins and their glycosylations.
        J Proteome Res. 2010; 9: 798-805
        • Nishibori M.
        • Wake H.
        • Morimatsu H.
        Histidine-rich glycoprotein as an excellent biomarker for sepsis and beyond.
        Crit Care. 2018; 22: 209
      1. A global reference for human genetic variation.
        Nature. 2015; 526: 68-74
        • Cunningham F.
        • Allen J.E.
        • Allen J.
        • Alvarez-Jarreta J.
        • Amode M.R.
        • Armean I.M.
        • Austine-Orimoloye O.
        • Azov A.G.
        • Barnes I.
        • Bennett R.
        • Berry A.
        • Bhai J.
        • Bignell A.
        • Billis K.
        • Boddu S.
        • Brooks L.
        • Charkhchi M.
        • Cummins C.
        • Da Rin Fioretto L.
        • Davidson C.
        • Dodiya K.
        • Donaldson S.
        • El Houdaigui B.
        • El Naboulsi T.
        • Fatima R.
        • Giron C.G.
        • Genez T.
        • Martinez J.G.
        • Guijarro-Clarke C.
        • Gymer A.
        • Hardy M.
        • Hollis Z.
        • Hourlier T.
        • Hunt T.
        • Juettemann T.
        • Kaikala V.
        • Kay M.
        • Lavidas I.
        • Le T.
        • Lemos D.
        • Marugán J.C.
        • Mohanan S.
        • Mushtaq A.
        • Naven M.
        • Ogeh D.N.
        • Parker A.
        • Parton A.
        • Perry M.
        • Piližota I.
        • Prosovetskaia I.
        • Sakthivel M.P.
        • Salam A.I.A.
        • Schmitt B.M.
        • Schuilenburg H.
        • Sheppard D.
        • Pérez-Silva J.G.
        • Stark W.
        • Steed E.
        • Sutinen K.
        • Sukumaran R.
        • Sumathipala D.
        • Suner M.-M.
        • Szpak M.
        • Thormann A.
        • Tricomi F.F.
        • Urbina-Gómez D.
        • Veidenberg A.
        • Walsh T.A.
        • Walts B.
        • Willhoft N.
        • Winterbottom A.
        • Wass E.
        • Chakiachvili M.
        • Flint B.
        • Frankish A.
        • Giorgetti S.
        • Haggerty L.
        • Hunt S.E.
        • IIsley G.R.
        • Loveland J.E.
        • Martin F.J.
        • Moore B.
        • Mudge J.M.
        • Muffato M.
        • Perry E.
        • Ruffier M.
        • Tate J.
        • Thybert D.
        • Trevanion S.J.
        • Dyer S.
        • Harrison P.W.
        • Howe K.L.
        • Yates A.D.
        • Zerbino D.R.
        • Flicek P.
        Ensembl 2022.
        Nucleic Acids Research. 2022; 50: D988-D995
        • Howe K.L.
        • Achuthan P.
        • Allen J.
        • Allen J.
        • Alvarez-Jarreta J.
        • Amode M.R.
        • Armean I.M.
        • Azov A.G.
        • Bennett R.
        • Bhai J.
        • Billis K.
        • Boddu S.
        • Charkhchi M.
        • Cummins C.
        • Da Rin Fioretto L.
        • Davidson C.
        • Dodiya K.
        • El Houdaigui B.
        • Fatima R.
        • Gall A.
        • Garcia Giron C.
        • Grego T.
        • Guijarro-Clarke C.
        • Haggerty L.
        • Hemrom A.
        • Hourlier T.
        • Izuogu O.G.
        • Juettemann T.
        • Kaikala V.
        • Kay M.
        • Lavidas I.
        • Le T.
        • Lemos D.
        • Gonzalez Martinez J.
        • Marugán J.C.
        • Maurel T.
        • McMahon A.C.
        • Mohanan S.
        • Moore B.
        • Muffato M.
        • Oheh D.N.
        • Paraschas D.
        • Parker A.
        • Parton A.
        • Prosovetskaia I.
        • Sakthivel M.P.
        • Salam A.I.A.
        • Schmitt B.M.
        • Schuilenburg H.
        • Sheppard D.
        • Steed E.
        • Szpak M.
        • Szuba M.
        • Taylor K.
        • Thormann A.
        • Threadgold G.
        • Walts B.
        • Winterbottom A.
        • Chakiachvili M.
        • Chaubal A.
        • De Silva N.
        • Flint B.
        • Frankish A.
        • Hunt S.E.
        • IIsley G.R.
        • Langridge N.
        • Loveland J.E.
        • Martin F.J.
        • Mudge J.M.
        • Morales J.
        • Perry E.
        • Ruffier M.
        • Tate J.
        • Thybert D.
        • Trevanion S.J.
        • Cunningham F.
        • Yates A.D.
        • Zerbino D.R.
        • Flicek P.
        Ensembl 2021.
        Nucleic Acids Research. 2021; 49: D884-D891
        • Jager S.
        • Cramer D.A.T.
        • Hoek M.
        • Mokiem N.J.
        • van Keulen B.J.
        • van Goudoever J.B.
        • Dingess K.A.
        • Heck A.J.R.
        Proteoform Profiles Reveal That Alpha-1-Antitrypsin in Human Serum and Milk Is Derived From a Common Source.
        Front Mol Biosci. 2022; 9858856
        • Chang K.T.
        • Guo J.
        • di Ronza A.
        • Sardiello M.
        Aminode: Identification of Evolutionary Constraints in the Human Proteome.
        Sci Rep. 2018; 8: 1357
        • Perez-Riverol Y.
        • Bai J.
        • Bandla C.
        • García-Seisdedos D.
        • Hewapathirana S.
        • Kamatchinathan S.
        • Kundu D.J.
        • Prakash A.
        • Frericks-Zipper A.
        • Eisenacher M.
        • Walzer M.
        • Wang S.
        • Brazma A.
        • Vizcaíno J.A.
        The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.
        Nucleic Acids Res. 2022; 50: D543-D552
        • Bienert S.
        • Waterhouse A.
        • de Beer T.A.P.
        • Tauriello G.
        • Studer G.
        • Bordoli L.
        • Schwede T.
        The SWISS-MODEL Repository—new features and functionality.
        Nucleic Acids Research. 2017; 45: D313-D319
        • Guex N.
        • Peitsch M.C.
        • Schwede T.
        Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective.
        ELECTROPHORESIS. 2009; 30: S162-S173
        • Studer G.
        • Rempfer C.
        • Waterhouse A.M.
        • Gumienny R.
        • Haas J.
        • Schwede T.
        QMEANDisCo—distance constraints applied on model quality estimation.
        Bioinformatics. 2020; 36: 1765-1771
        • Bertoni M.
        • Kiefer F.
        • Biasini M.
        • Bordoli L.
        • Schwede T.
        Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology.
        Sci Rep. 2017; 710480