Biology:Glycan-protein interactions

From HandWiki
Short description: Class of biological intermolecular interactions
3D visualization of spike protein with 3 different subunits highlighted in different colors and solvent-facing glycans colored in blue
Spike (S) protein responsible for the binding to ACE2 receptors in COVID-19. Glycans highlighted in blue. Structure taken from PDB entry 6VXX[1]

Glycan-Protein interactions represent a class of biomolecular interactions that occur between free or protein-bound glycans and their cognate binding partners. Intramolecular glycan-protein (protein-glycan) interactions occur between glycans and proteins that they are covalently attached to. Together with protein-protein interactions, they form a mechanistic basis for many essential cell processes, especially for cell-cell interactions and host-cell interactions.[2] For instance, SARS-CoV-2, the causative agent of COVID-19, employs its extensively glycosylated spike (S) protein to bind to the ACE2 receptor, allowing it to enter host cells.[3] The spike protein is a trimeric structure, with each subunit containing 22 N-glycosylation sites, making it an attractive target for vaccine search.[3][4]

Glycosylation, i.e., the addition of glycans (a generic name for monosaccharides and oligosaccharides) to a protein, is one of the major post-translational modification of proteins contributing to the enormous biological complexity of life. Indeed, three different hexoses could theoretically produce from 1056 to 27,648 unique trisaccharides in contrast to only 6 peptides or oligonucleotides formed from 3 amino acids or 3 nucleotides respectively.[2] In contrast to template-driven protein biosynthesis, the "language" of glycosylation is still unknown, making glycobiology a hot topic of current research given their prevalence in living organisms.[2]

The study of glycan-protein interactions provides insight into the mechanisms of cell-signaling and allows to create better-diagnosing tools for many diseases, including cancer. Indeed, there are no known types of cancer that do not involve erratic patterns of protein glycosylation.[5]

Thermodynamics of Binding

The binding of glycan-binding proteins (GBPs) to glycans could be modeled with simple equilibrium. Denoting glycans as [math]\displaystyle{ G }[/math] and proteins as [math]\displaystyle{ P }[/math]:

[math]\displaystyle{ Protein (P) + Glycan (G) \rightleftharpoons PG }[/math]

With an associated equilibrium constant of

[math]\displaystyle{ K_a = \frac{[PG]}{[P][G]} }[/math]

Which is rearranged to give dissociation constant [math]\displaystyle{ K_d }[/math] following biochemical conventions:

[math]\displaystyle{ K_d = \frac{[P][G]}{[PG]} }[/math]

Given that many GBPs exhibit multivalency, this model may be expanded to account for multiple equilibria:

[math]\displaystyle{ P + G \rightleftharpoons PG }[/math]

[math]\displaystyle{ PG + G \rightleftharpoons PG_2 }[/math]

[math]\displaystyle{ \dots }[/math]

[math]\displaystyle{ PG_{n-1} + G \rightleftharpoons PG_n }[/math]

Denoting cumulative equilibrium of binding with [math]\displaystyle{ i }[/math] ligands as

[math]\displaystyle{ P + iG \rightleftharpoons PG_i }[/math]

With corresponding equilibrium constant:

[math]\displaystyle{ \beta_i = \frac{[PG_i]}{[P][G]^i} }[/math]

And writing material balance for protein ([math]\displaystyle{ c_P }[/math] denotes the total concentration of protein):

[math]\displaystyle{ c_P = [P] + [PG] + \dots + [PG_n] }[/math]

Expressing the terms through an equilibrium constant, a final result is found:

[math]\displaystyle{ c_P = [P](1 + \beta_1[G] + \dots + \beta_n [G]^n }[/math]

The concentration of free protein is, thus:

[math]\displaystyle{ [P] = \frac{c_P}{1 + \sum_{i=1}^{n}{\beta_i[G]^i}} }[/math]

If [math]\displaystyle{ n=1 }[/math], i.e. there is only one carbohydrate receptor domain, the equation reduces to

[math]\displaystyle{ [P] = \frac{c_P}{1 + \beta_1 [G]} }[/math]

With increasing [math]\displaystyle{ i }[/math] the concentration of free protein decreases; hence, the apparent [math]\displaystyle{ K_D }[/math] decreases too.

Binding with aromatic rings

The schematic representation of CH-pi interactions including key physical characteristics: the angle from the normal (theta) and a distance from the C-atom to the plane of the ring (here, distance from C to X)
Figure 1. The schematic representation of [math]\displaystyle{ CH-\pi }[/math] interactions

The chemical intuition suggests that the glycan-binding sites may be enriched in polar amino acid residues that form non-covalent interactions, such as hydrogen bonds, with polar carbohydrates. Indeed, a statistical analysis of carbohydrate-binding pockets shows that aspartic acid and asparagine residues are present twice as often as would be predicted by chance.[6] Surprisingly, there is an even stronger preference for aromatic amino acids: tryptophan has a 9-fold increase in prevalence, tyrosine a 3-fold one, and histidine a 2-fold increase. It has been shown that the underlying force is the [math]\displaystyle{ CH-\pi }[/math] interaction between the aromatic [math]\displaystyle{ \pi }[/math] system and the [math]\displaystyle{ C-H }[/math] in carbohydrate as shown in Figure 1. The [math]\displaystyle{ CH-\pi }[/math] interaction is identified if the [math]\displaystyle{ \theta \leqslant 40 }[/math]°, the [math]\displaystyle{ CH-\pi }[/math] distance (distance from [math]\displaystyle{ C }[/math] to [math]\displaystyle{ X }[/math]) is less than 4.5Å.[6]

Effects of stereochemistry

The definition of alpha (top face) and beta faces (bottom face) for glucose and galactose. The stereochemical difference for two hexoses is highlighted in red.
The definition of alpha and beta faces for glucose and galactose. The stereochemical difference for two hexoses is highlighted in red.

This [math]\displaystyle{ CH-\pi }[/math] interaction strongly depends on the stereochemistry of the carbohydrate molecule. For example, consider the top ([math]\displaystyle{ \beta }[/math]) and bottom ([math]\displaystyle{ \alpha }[/math]) faces of [math]\displaystyle{ \beta }[/math]-D-Glucose and [math]\displaystyle{ \beta }[/math]-D-Galactose. It has been shown that a single change in the stereochemistry at C4 carbon shifts preference for aromatic residues from [math]\displaystyle{ \beta }[/math] side (2.7 fold preference for glucose) to the [math]\displaystyle{ \alpha }[/math] side (14 fold preference for galactose).[6]

Effects of electronics

The comparison of electrostatic surface potentials (ESPs) of aromatic rings in tryptophan, tyrosine, phenylalanine, and histidine suggests that electronic effects also play a role in the binding to glycans (see Figure 2). After normalizing the electron densities for surface area, the tryptophan still remains the most electron rich acceptor of [math]\displaystyle{ CH-\pi }[/math] interactions, suggesting a possible reason for its 9-fold prevalence in carbohydrate binding pockets.[6] Overall, the electrostatic potential maps follow the prevalence trend of [math]\ce{ Trp >> Tyr > (Phe) > His }[/math].

electrostatic surface potential maps of tryptophan, tyrosine, phenylalanine, and histidine that show differences in electron density in their aromatic rings
Figure 2. Electrostatic Surface Potentials (ESPs) of aromatic amino acids. Electron rich areas are depicted with red, while electron poor areas are depicted with blue.

Carbohydrate-binding partners

There are many proteins capable of binding to glycans, including lectins, antibodies, microbial adhesins, viral agglutinins, etc.

Lectins

Lectins is a generic name for proteins with carbohydrate-recognizing domains (CRD). Although it became almost synonymous with glycan-binding proteins, it does not include antibodies which also belong to the class.

Lectins found in plants and fungi cells have been extensively used in research as a tool to detect, purify, and analyze glycans. However, useful lectins usually have sub-optimal specificities. For instance, Ulex europaeus agglutinin-1 (UEA-1), a plant-extracted lectin capable of binding to human blood type O antigen, can also bind to unrelated glycans such as 2'-fucosyllactose, GalNAcα1-4(Fucα1-2)Galβ1-4GlcNAc, and Lewis-Y antigen.[7]

Antibodies

Although antibodies exhibit nanomolar affinities toward protein antigens, the specificity against glycans is very limited.[8] In fact, available antibodies may bind only <4% of the 7000 mammalian glycan antigens; moreover, most of those antibodies have low affinity and exhibit cross-reactivity.[9][7]

Lambodies

In contrast with jawed vertebrates whose immunity is based on variable, diverse, and joining gene segments (VDJs) of immunoglobulins, the jawless invertebrates, such as lamprey and hagfish, create a receptor diversity by somatic DNA rearrangement of leucine-rich repeat (LRR) modules that are incorporate in *vlr* genes (variable leukocyte receptors).[10] Those LRR form 3D structures resembling curved solenoids that selectively bind specific glycans.[11]

A study from University of Maryland has shown that lamprey antibodies (lambodies) could selectively bind to tumor-associated carbohydrate antigens (such as Tn and TF[math]\displaystyle{ \alpha }[/math]) at nanomolar affinities.[9] The T-nouvelle antigen (Tn) and TF[math]\displaystyle{ \alpha }[/math] are present in proteins in as much as 90% of different cancer cells after post-translational modification, whereas in healthy cells those antigens are much more complex. A selection of lambodies that could bind to aGPA, a human erythrocyte membrane glycoprotein that is covered with 16 TF[math]\displaystyle{ \alpha }[/math] moieties, through magnetic-activated cell sorting (MACS) and fluorescence-activated cell sorting (FACS) has yielded a leucine-rich lambody VLRB.aGPA.23. This lambody selectively stained (over healthy samples) cells from 14 different types of adenocarcinomas: bladder, esophagus, ovary, tongue, cheek, cervix, liver, nose, nasopharynx, greater omentum, colon, breast, larynx, and lung.[9] Moreover, patients whose tissues stained positive with VLRB.aGPA.23 had a significantly smaller survival rate.[9]

A close look at the crystal structure of VLRB.aGPA.23 reveals a tryptophan residue at position 187 right over the carbohydrate binding pocket.[12]

Crystal Structure of lambody VLRB.aGPA.23 created from PDB Entry 4K79, showing a solenoid-like structure consisting of beta-sheets and a tryptophan residue over the bound carbohydrate
Crystal Structure of VLRB.aGPA.23 created from PDB Entry 4K79[12]

Multivalency in structure

Cartoon depiction of common oligomeric structures of lectins, including dimers, trimers, pentamers, hexamers, pores formed from stacking hexamers, bouquets of trimers connected by peptide linkers, or cruciform (4 trimers spread out in a cross-resembling fashion)
Cartoon depiction of common oligomeric structures of lectins

Many glycan binding proteins (GBPs) are oligomeric and typically contain multiple sites for glycan binding (also called carbohydrate-recognition domains). The ability to form multivalent protein-ligand interactions significantly enhances the strength of binding: while [math]\displaystyle{ K_D }[/math] values for individual CRD-glycan interactions may be in the mM range, the overall affinity of GBP towards glycans may reach nanomolar or even picomolar ranges. The overall strength of interactions is described as avidity [math]\displaystyle{ K_D }[/math] (in contrast with an affinity [math]\displaystyle{ K_D }[/math] which describes single equilibrium). Sometimes the avidity is also called an apparent [math]\displaystyle{ K_D }[/math] to emphasize the non-equilibrium nature of the interaction.[13]

Common oligomerization structures of lectins are shown below. For example, galectins are usually observed as dimers, while intelectins form trimers and pentraxins assemble into pentamers. Larger structures, like hexameric Reg proteins, may assemble into membrane penetrating pores. Collectins may form even more bizarre complexes: bouquets of trimers or even cruciform-like structures (e.g. in SP-D).[14]

Current Research

Given the importance of glycan-protein interactions, there is an ongoing research dedicated to the a) creation of new tools to detect glycan-protein interactions and b) using those tools to decipher the so-called sugar code.

Glycan Arrays

One of the most widely used tools for probing glycan-protein interactions is glycan arrays. A glycan array usually is an NHS- or epoxy-activated glass slides on which various glycans were printed using robotic printing.[15][16] These commercially available arrays may contain up to 600 different glycans, specificity of which has been extensively studied.[17]

Glycan-protein interactions may be detected by testing proteins of interest (or libraries of those) that bear fluorescent tags. The structure of the glycan-binding protein may be deciphered by several analytical methods based on mass-spectrometry, including MALDI-MS, LC-MS, tandem MS-MS, and/or 2D NMR.[18]

Bioinformatics driven research

Computational methods have been applied to search for parameters (e.g. residue propensity, hydrophobicity, planarity) that could distinguish glycan-binding proteins from other surface patches. For example, a model trained on 19 non-homologous carbohydrate binding structures was able to predict carbohydrate-binding domains (CRDs) with an accuracy of 65% for non-enzymatic structures and 87% for enzymatic ones.[19] Further studies have employed calculations of Van der Waals energies of protein-probe interactions and amino acid propensities to identify CRDs with 98% specificity at 73% sensitivity.[20] More recent methods can predict CRDs even from protein sequences, by comparing the sequence with those for which structures are already known.[21]

Sugar code

In contrast with protein studies, where a primary protein structure is unambiguously defined by the sequence of nucleotides (the genetic code), the glycobiology still cannot explain how a certain "message" is encoded using carbohydrates or how it is "read" and "translated" by other biological entities.

An interdisciplinary effort, combining chemistry, biology, and biochemistry, studies glycan-protein interactions to see how different sequences of carbohydrates initiate different cellular responses.[22]

See also

References

  1. Walls, Alexandra C.; Park, Young-Jun; Tortorici, M. Alejandra; Wall, Abigail; McGuire, Andrew T.; Veesler, David (2020-03-09). "Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein". Cell 181 (2): 281–292.e6. doi:10.1016/j.cell.2020.02.058. ISSN 0092-8674. PMID 32155444. 
  2. 2.0 2.1 2.2 "Historical Background and Overview". Essentials of Glycobiology (3rd ed.). Cold Spring Harbor Laboratory Press. 2015. doi:10.1101/glycobiology.3e.001. http://www.ncbi.nlm.nih.gov/books/NBK316258/. Retrieved 2020-05-09. 
  3. 3.0 3.1 Watanabe, Yasunori; Allen, Joel D.; Wrapp, Daniel; McLellan, Jason S.; Crispin, Max (2020-05-04). "Site-specific glycan analysis of the SARS-CoV-2 spike" (in en). Science 369 (6501): 330–333. doi:10.1126/science.abb9983. ISSN 0036-8075. PMID 32366695. Bibcode2020Sci...369..330W. 
  4. Amanat, Fatima; Krammer, Florian (2020-04-06). "SARS-CoV-2 Vaccines: Status Report". Immunity 52 (4): 583–589. doi:10.1016/j.immuni.2020.03.007. ISSN 1074-7613. PMID 32259480. 
  5. "Anti-Thomsen-Friedenreich-Ag (anti-TF-Ag) potential for cancer therapy". Frontiers in Bioscience 4 (3): 840–63. January 2012. doi:10.2741/s304. PMID 22202095. 
  6. 6.0 6.1 6.2 6.3 "Carbohydrate-Aromatic Interactions in Proteins". Journal of the American Chemical Society 137 (48): 15152–60. December 2015. doi:10.1021/jacs.5b08424. PMID 26561965. 
  7. 7.0 7.1 "Structural Insights into VLR Fine Specificity for Blood Group Carbohydrates". Structure 25 (11): 1667–1678.e4. November 2017. doi:10.1016/j.str.2017.09.003. PMID 28988747. 
  8. "Antibody specificity and promiscuity". The Biochemical Journal 476 (3): 433–447. February 2019. doi:10.1042/BCJ20180670. PMID 30723137. 
  9. 9.0 9.1 9.2 9.3 "Sugar-binding proteins from fish: selection of high affinity "lambodies" that recognize biomedically relevant glycans". ACS Chemical Biology 8 (1): 152–60. January 2013. doi:10.1021/cb300399s. PMID 23030719. 
  10. "Antigen recognition by variable lymphocyte receptors". Science 321 (5897): 1834–7. September 2008. doi:10.1126/science.1162484. PMID 18818359. Bibcode2008Sci...321.1834H. 
  11. "The evolution of adaptive immune systems". Cell 124 (4): 815–22. February 2006. doi:10.1016/j.cell.2006.02.001. PMID 16497590. 
  12. 12.0 12.1 "Recognition of the Thomsen-Friedenreich pancarcinoma carbohydrate antigen by a lamprey variable lymphocyte receptor". The Journal of Biological Chemistry 288 (32): 23597–606. August 2013. doi:10.1074/jbc.M113.480467. PMID 23782692. 
  13. "Principles of Glycan Recognition". Essentials of Glycobiology (3rd ed.). Cold Spring Harbor Laboratory Press. 2015. doi:10.1101/glycobiology.3e.029. http://www.ncbi.nlm.nih.gov/books/NBK453057/. 
  14. "Recognition of microbial glycans by soluble human lectins". Current Opinion in Structural Biology. Carbohydrates: A feast of structural glycobiology • Sequences and topology: Computational studies of protein-protein interactions 44: 168–178. June 2017. doi:10.1016/j.sbi.2017.04.002. PMID 28482337. 
  15. "Novel Method Opens Door to Better Understanding Glycan–Protein Interactions" (in en-US). 2018-03-01. https://www.genengnews.com/topics/translational-medicine/novel-method-opens-door-to-better-understanding-glycan-protein-interactions/. 
  16. Oyelaran, Oyindasola; Gildersleeve, Jeffrey C. (2009-10-01). "Glycan Arrays: Recent Advances and Future Challenges". Current Opinion in Chemical Biology 13 (4): 406–413. doi:10.1016/j.cbpa.2009.06.021. ISSN 1367-5931. PMID 19625207. 
  17. Wang, Linlin; Cummings, Richard D; Smith, David F; Huflejt, Margaret; Campbell, Christopher T; Gildersleeve, Jeffrey C; Gerlach, Jared Q; Kilcoyne, Michelle et al. (2014-03-22). "Cross-platform comparison of glycan microarray formats". Glycobiology 24 (6): 507–517. doi:10.1093/glycob/cwu019. ISSN 0959-6658. PMID 24658466. 
  18. Raman, Rahul; Tharakaraman, Kannan; Sasisekharan, V; Sasisekharan, Ram (2016-10-25). "Glycan–protein interactions in viral pathogenesis". Current Opinion in Structural Biology 40: 153–162. doi:10.1016/j.sbi.2016.10.003. ISSN 0959-440X. PMID 27792989. 
  19. Taroni, Chiara; Jones, Susan; Thornton, Janet M. (2000-02-01). "Analysis and prediction of carbohydrate binding sites" (in en). Protein Engineering, Design and Selection 13 (2): 89–98. doi:10.1093/protein/13.2.89. ISSN 1741-0126. PMID 10708647. https://academic.oup.com/peds/article/13/2/89/1490499. 
  20. Kulharia, Mahesh; Bridgett, Stephen J.; Goody, Roger S.; Jackson, Richard M. (2009-10-01). "InCa-SiteFinder: A method for structure-based prediction of inositol and carbohydrate binding sites on proteins" (in en). Journal of Molecular Graphics and Modelling 28 (3): 297–303. doi:10.1016/j.jmgm.2009.08.009. ISSN 1093-3263. PMID 19762259. http://www.sciencedirect.com/science/article/pii/S1093326309001053. 
  21. Zhao, Huiying; Taherzadeh, Ghazaleh; Zhou, Yaoqi; Yang, Yuedong (2018). "Computational Prediction of Carbohydrate-Binding Proteins and Binding Sites" (in en). Current Protocols in Protein Science 94 (1): e75. doi:10.1002/cpps.75. ISSN 1934-3663. PMID 30106511. 
  22. Solís, Dolores; Bovin, Nicolai V.; Davis, Anthony P.; Jiménez-Barbero, Jesús; Romero, Antonio; Roy, René; Smetana, Karel; Gabius, Hans-Joachim (2015-01-01). "A guide into glycosciences: How chemistry, biochemistry and biology cooperate to crack the sugar code" (in en). Biochimica et Biophysica Acta (BBA) - General Subjects 1850 (1): 186–235. doi:10.1016/j.bbagen.2014.03.016. ISSN 0304-4165. PMID 24685397. http://www.sciencedirect.com/science/article/pii/S0304416514001202.