Biology:C2orf74

From HandWiki
Chromosomal location of C2orf74. Image made using NCBI Genome Decorator Page[1]

C2orf74, also known as LOC339804, is a protein encoding gene located on the short arm of chromosome 2 near position 15 (2p15).[2] Isoform 1 of the gene is 19,713 base pairs long.[2] C2orf74 has orthologs in 135 different species,[2] including primarily placental mammals and some marsupials.

The protein encoded by the C2orf74 gene has two isoforms, the longest of which (isoform 1) is 187 amino acids in length.[3] This protein is linked to the development of autoimmune disorders such as ankylosing spondylitis[4] and diseases affecting the colon[5][6][7]

Gene

C2orf74 is a gene located on the plus strand at 2p15 in humans.[2] It is 19,713 base pairs in length beginning at 61,145,116 and ending at 61,164,828 and includes 8 exons.[2] Other genes within its neighborhood include KIAA841, LOC105374759, LOC105374758, LOC339803, AHSA2P, USP34, and SNORA70B.[2]

Transcripts

Transcript variants

C2orf74 has 6 validated mRNA products created via alternative splicing that give rise to two different isoforms.[2] An extended version of Isoform 1 has also been sequenced utilizing a 5' in frame start codon, though this protein product is not formally acknowledged as a separate isoform by NCBI.[8]

C2orf74 Transcript variants
Name mRNA accession number transcript length number of exons protein length isoform
Transcript variant 1 NM_001143959.4 1097 bp 5 187 aa 1
Transcript variant 2 NM_001143960.3 851 bp 4 115 aa 2
Transcript variant 3 NM_001316317.2 737 bp 3 115 aa 2
Transcript variant 4 NM_001367069.1 1002 bp 5 115 aa 2
Transcript variant 5 NM_001367070.1 1124 bp 6 115 aa 2
Transcript variant 6 NM_001367071.1 973 bp 5 115 aa 2
Transcript variant 1 extension A8MZ97 1097 bp 5 194 aa 1+

The above table is a compilation of the transcript variants of C2orf74 acknowledged on the C2orf74 gene page of NCBI.

Proteins

There are two known isoforms of the C2orf74 encoded protein. Isoform 1 is derived from transcript variant 1, and is 187 amino acids in length.[3] There is a putative N-terminal extension of this isoform that utilizes a 5' start codon and adds 7 amino acids to the start of isoform 1, bringing the length of the protein up to 194 amino acids.[8] Isoform 2 is derived from any one of transcript variants 2, 3, 4, 5, or 6.[2] It is created using an alternative promoter, features a different 5'UTR, and a shorter N-terminal end that excludes the first 3 exons that comprise the N-terminal end of exon 1. The result is a shorter protein 115 amino acids in length that lacks a highly conserved transmembrane domain featured at the N-terminal end of isoform 1.[9]

C2orf74 protein isoforms
Name Transcript variant Peptide length Domains present
Isoform 1 extension 1 extension 194 aa TMEM, DUF
Isoform 1 1 187 aa TMEM, DUF
Isoform 2 2,3,4,5,6 115 aa DUF

C2orf74 MSA.png

The above figure depicts a conceptual translation of isoform 1 of C2orf74 made using SixFrame.[10] Exon boundaries are depicted in blue font. The 5'UTR of this protein is shown to have an upstream in frame stop codon (red), and an upstream in frame start codon (green). The putative N-terminal extension is depicted in light gray. The N-terminal transmembrane domain is highlighted in lavender. Regions conserved among orthologs are highlighted in cyan, while regions prone to deletion are highlighted in gray. Phosphorylation sites are highlighted in red with the phosphorylated amino acid underlined. Significant SNPs are highlighted in pink with a key pictured to the right detailing the type of change and reason for inclusion. Polyadenylation signals in the 3'UTR are highlighted in orange.

Isoform 1

Isoform 1 of the C2orf74 protein has a calculated molecular weight of approximately 21 kDa, and a pI of 5.74.[11][12] It does not display any unique amino acid composition, cysteine spacing, number of multiplets, or periodicity.[13] This protein isoform has a putative 7 aa N-terminal extension[8] It contains a 21 aa transmembrane region at position 7.[3]

Domains

The transmembrane region begins 7 amino acids from the N-terminal end of the protein, and ends at the 29th amino acid in humans. This region has been identified by NCBI,[3] as well as being supported by biochemical analysis. The biochemical qualities characterizing this region as a transmembrane region include a neutral charge cluster and a high-scoring hydrophobic segment, as well as alpha-helical secondary structure.[13][14] This region is also highly conserved among all orthologs, indicating it as a region of functional significance.[15]

The region downstream of the transmembrane region is considered a domain of unknown function (DUF) within pfam 15484.[3] Approximately 52% of this portion of the protein is considered to be disordered, making confidence in prediction of domain function difficult.[16] However, the C-terminal end is highly conserved among all orthologs.[15]

Antibody staining results from The Human Protein Atlas.[17] Immunocytochemical antibody staining results are listed as showing localization to the centrosome (Green, frame A). Other examples of the same antibody staining as well as immunohistochemical results show strong presence of this gene in the cytoplasm. (Green, frame B. Dark brown, frame C).

Structure

C2orf74 isoform 1 is shown to be dominated primarily by helical secondary structure, with only short regions being predicted to include beta sheet conformations.[14] Predictions of tertiary structure tend to showcase a globular DUF, at the end of a helical transmembrane domain.[16][18] Structural predictions of isoform 2 which includes only the DUF also appear to be strictly globular in conformation.[16][18]

subcellular localization

The presence of a transmembrane domain indicates that Isoform 1 of the C2orf74 product is found within a membranous cellular structure. Analysis of likely subcellular localization among orthologs indicates the C2orf74 product is most likely found in the nuclear membrane, mitochondria, or endoplasmic reticulum.[19] Immunocytochemical imaging shows C2orf74 to be localized to the centromere, while immunohistochemical imaging shows it to be centralized in the cytosol.[17]

Gene level regulation

Promoter

Birds eye view of the coding region of the C2orf74 gene and the promoters in the region. Red boxes represent C2orf74 exons while blue arrows represent promoter regions.

C2orf74 has 3 possible promoters that produce complete protein isoforms. Isoform 1 could be made by either GXP_6040264 or GXP_2056207, though GXP_6040264 shows the most promise, as it has a higher number of CAGE tags (249) than GXP_2056207 (133), and is conserved among several orthologs. Isoform 2 is made by the promoter GXP_649849.[20]

GXP_6040264 contains over 300 transcription factor binding sites, with a fork head domain factor (V$FKHD), a bromodomain and phd domain transcription factor (V$BPTF), and a sex/testes determining and related HMG box factor (V$SORY) being the most conserved regions among mammals.[20]

Expression

C2orf74 is expressed at minimal levels in several cell types. Due to the low levels of expression, meaningful trends in localization are difficult to discern.[2] In situ hybridization of C2orf74 and some RNA sequencing assays indicate potential for localization in the cerebellum.[2][21] Microarray data from NCBI GEO indicates lower levels of C2orf74 expression in individuals with colorectal tumors such as adenomas or cancerous colorectal tumors when compared to normal mucosa or tumors of non-colorectal origin such as carcinomas.[22]

Transcript level regulation

Predicted 3D structure of the human 3'UTR. Stem-loops have been colored red, yellow, green, cyan, blue, purple, and magenta. Potential mi-RNA binding sites are labelled in light pink, and polyadenylation sites are labelled in orange.

The 5' region of transcript variant 1 is 232 bp in length and features an upstream in frame stop codon as well as an upstream in frame start codon.[9] When expressed, this start codon would add a 7 aa N-terminal extension to transcript variant 1.[8] Analysis of potential 3D structure of the 5'UTR of isoform 1 shows the presence of 2 hairpin structures. The 5' UTR of transcript variants 2 through 6 differs from that of transcript variant 1. However, the 5' UTR differs a great degree between orthologs, indicating that it may not be a region of great importance in terms of transcriptional regulation.

The 3' UTR is conserved among all human transcript variants, though it does not show significant conservation among mammalian species. It is 301 bp in length, and contains two polyadenylation signals at 981 bp and 1071 bp respectively.[9] It also contains two partially conserved mi-RNA binding sites at 73 bp (has-mir-241) and 270 bp (has-miR-23),[23] though neither of the mi-RNAs predicted to bind appear to be present in the human transcriptome.[24] The human 3'UTR is found to be rich in stem-loop structures

Protein level regulation

C2orf74 is predicted to have 4 CK2 phosphorylation sites, as well as 3 PKC phosphorylation sites.[25] The presence of CK2 and PKC phosphorylation sites are common among many orthologs. Myristoylation sites are also common among c2orf74 orthologs, though they are less conserved.[26]

Significance of Phosphorylation sites

CK2

Caesin Kinase 2 is a protein kinase that is serine/threonine specific and plays a significant role in cell signaling pathways related to cell cycling, regulation, and development. Association with C2orf74 may implicate it as a member of an intracellular phosphorylation chain governing cell development, and explain its association with conditions such as cancer and autoimmunity.

PKC

Protein kinase C is a family of protein kinases that are serine and threonine specific and play a role in regulating a broad range of cellular functions, particularly those involving phosphorylation cascades. As with CK2, C2orf74's association with PKC may implicate it as a signaling molecule involved in a phosphorylation cascade. This may provide context as to the nature of C2orf74's relationship to autoimmune disease and cancer.

Homology

Orthologs

C2orf74 first appeared in mammals and is found in animals as distantly related to humans as marsupials.[27] The table below highlights 20 selected orthologs from various mammalian clades arranged by date of divergence from the human lineage. Red tiles indicate high similarity to the human sequence and blue tiles indicate low similarity. In general, the samples follow the pattern in which more recent evolutionary diversion results in more similar genotypes. Notable exceptions, however, include the galago, mouse, and manatee.

C2orf74 orthologs .png

Rate of Evolution

The figures below show in more detail the evolutionary history of C2orf74. To the right is a comparison of the divergence rate of C2orf74 compared to that of cytochrome C and fibrinogen alpha. Given that fibrinogen alpha in this figure serves as a standard example of a rapidly changing protein, one can see that C2orf74 is evolving quite quickly.

Figure 4: Rate of evolution comparison between C2orf74, Cytochrome C, and Fibrinogen Alpha. C2orf74 appears to evolve even faster than Fibrinogen alpha, which serves as a standard for rapidly evolving genes.

Protein interactions

Transcription factors

There are three types of transcription factors that have been predicted to bind to C2orf74. These transcription factors are POT1, SMAGP, and SRPK1.

POT1

POT1 is a telomere end binding protein. It is as of yet unclear how this relates to the predicted function of C2orf74 given previous research and predictions of subcellular localization.

SMAGP

SMAGP is a small transmembrane and glycosylated protein.[28] Association with SMAGP makes sense given the subcellular localization of both structures to the nuclear membrane. Its possible that association with SMAGP may aid C2orf74 as a protein complex associated with intracellular signaling pathways.

SRPK1

SRPK1 is a protein kinase localized to the nucleus and cytoplasm. Association with SRPK1 also makes sense for C2orf74 given the subcellular localization of both proteins and implication in phosphorylative processes.

Clinical significance

Disease association

Bowel disease

Several studies have been able to link differential C2orf74 functionality to bowel disease. Two separate studies have identified C2orf74 as a potential susceptibility locus for Crohn's disease.[6][7] Furthermore, various studies reported in NCBI GEO show differential expression of C2orf74 in benign and cancerous colorectal tumor tissues.[29]

Left: Microarray data from NCBI GEO showing decreased level of C2orf74 expression in colorectal cancer cells regardless of whether they were positive or negative for CD133 (a proposed biomarker for cancer.), but not in other types of cancerous cells such as carcinoma associated fibroblasts. Right: Microarray data from NCBI GEO showing decreased level of C2orf74 expression in colorectal adenomas, but not in normal mucosa. Note that adenomas are benign tumors that arise from normal mucosa, making the difference in C2orf74 expression relevant.

Autoimmune disease

Aside from Crohn's disease, C2orf74 has also been found to be a susceptibility locus for ankylosing spondylitis,[4] and generally for other nondescript autoimmune conditions.[5] The SNP believed to play a role in C2orf74's relationship to ankylosing spondylitis is found within the coding region of the gene, and is denoted in the conceptual translation found in the Protein section above.[6]

Mutations (SNPs of interest)

At 36aa there is a missense SNP that may be either a Tyrosine (Tyr, Y) or an Aspartate (Asp, D). This is caused by a SNP is associated with ankylosing spondylitis can be found at 319 bp on transcript variant 1[4]

References

  1. "Genome Decoration Page". https://www.ncbi.nlm.nih.gov/genome/tools/gdp/. 
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 "C2orf74 chromosome 2 open reading frame 74 [Homo sapiens (human) - Gene - NCBI"]. https://www.ncbi.nlm.nih.gov/gene/339804. 
  3. 3.0 3.1 3.2 3.3 3.4 "uncharacterized protein C2orf74 isoform 1 [Homo sapiens - Protein - NCBI"]. https://www.ncbi.nlm.nih.gov/protein/NP_001137431.2. 
  4. 4.0 4.1 4.2 Wang, Mengmeng; Xin, Lihong; Cai, Guoqi; Zhang, Xu; Yang, Xiao; Li, Xiaona; Xia, Qing; Wang, Li et al. (2017-05-11). "Pathogenic variants screening in seventeen candidate genes on 2p15 for association with ankylosing spondylitis in a Han Chinese population" (in en). PLOS ONE 12 (5): e0177080. doi:10.1371/journal.pone.0177080. ISSN 1932-6203. PMID 28493913. Bibcode2017PLoSO..1277080W. 
  5. 5.0 5.1 Gabrielsen, Ingvild S. M.; Amundsen, Silja Svanstrøm; Helgeland, Hanna; Flåm, Siri Tennebø; Hatinoor, Nimo; Holm, Kristian; Viken, Marte K.; Lie, Benedicte A. (2016-07-15). "Genetic risk variants for autoimmune diseases that influence gene expression in thymus" (in en). Human Molecular Genetics 25 (14): 3117–3124. doi:10.1093/hmg/ddw152. ISSN 0964-6906. PMID 27199374. https://academic.oup.com/hmg/article/25/14/3117/2525777. 
  6. 6.0 6.1 6.2 Franke, Andre; McGovern, Dermot P. B.; Barrett, Jeffrey C.; Wang, Kai; Radford-Smith, Graham L.; Ahmad, Tariq; Lees, Charlie W.; Balschun, Tobias et al. (December 2010). "Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci". Nature Genetics 42 (12): 1118–1125. doi:10.1038/ng.717. ISSN 1546-1718. PMID 21102463. 
  7. 7.0 7.1 Kenny, Eimear E.; Pe'er, Itsik; Karban, Amir; Ozelius, Laurie; Mitchell, Adele A.; Ng, Sok Meng; Erazo, Monica; Ostrer, Harry et al. (2012). "A genome-wide scan of Ashkenazi Jewish Crohn's disease suggests novel susceptibility loci". PLOS Genetics 8 (3): e1002559. doi:10.1371/journal.pgen.1002559. ISSN 1553-7404. PMID 22412388. 
  8. 8.0 8.1 8.2 8.3 "RecName: Full=Uncharacterized protein C2orf74 - Protein - NCBI". https://www.ncbi.nlm.nih.gov/protein/A8MZ97. 
  9. 9.0 9.1 9.2 (in en-US) Homo sapiens chromosome 2 open reading frame 74 (C2orf74), transcript variant 1, mRNA. 2020-09-16. http://www.ncbi.nlm.nih.gov/nuccore/NM_001143959.4. 
  10. "Six-Frame Translation". https://www.bioline.com/media/calculator/01_13.html. 
  11. "C2orf74 Gene - GeneCards | CB074 Protein | CB074 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=C2orf74. 
  12. "ExPASy - Compute pI/Mw tool". https://web.expasy.org/compute_pi/. 
  13. 13.0 13.1 "SAPS < Sequence Statistics < EMBL-EBI". https://www.ebi.ac.uk/Tools/seqstats/saps/. 
  14. 14.0 14.1 "Bioinformatics Toolkit". https://toolkit.tuebingen.mpg.de/tools/ali2d. 
  15. 15.0 15.1 "Clustal Omega < Multiple Sequence Alignment < EMBL-EBI". https://www.ebi.ac.uk/Tools/msa/clustalo/. 
  16. 16.0 16.1 16.2 "PHYRE2 Protein Fold Recognition Server". http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index. 
  17. 17.0 17.1 "The Human Protein Atlas". https://www.proteinatlas.org/. 
  18. 18.0 18.1 "I-TASSER server for protein structure and function prediction". https://zhanglab.ccmb.med.umich.edu/I-TASSER/. 
  19. "PSORT II Prediction". https://psort.hgc.jp/form2.html. 
  20. 20.0 20.1 "Genomatix" (in de-DE). https://www.genomatix.de/. 
  21. "Brain Map - brain-map.org". https://portal.brain-map.org/. 
  22. "Home - GEO DataSets - NCBI". https://www.ncbi.nlm.nih.gov/gds/?term=. 
  23. "TargetScanHuman 7.2". http://www.targetscan.org/vert_72/. 
  24. "miRDB - MicroRNA Target Prediction Database". http://www.mirdb.org/. 
  25. "Motif Scan" (in en). https://myhits.sib.swiss/cgi-bin/motif_scan. 
  26. "ExPASy - Myristoylation tool". https://web.expasy.org/myristoylator/. 
  27. "TimeTree :: The Timescale of Life". http://www.timetree.org/. 
  28. "SMAGP Gene - GeneCards | SMAGP Protein | SMAGP Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=SMAGP. 
  29. "GEO Profile Links for Gene (Select 339804) - GEO Profiles - NCBI". https://www.ncbi.nlm.nih.gov/geoprofiles?LinkName=gene_geoprofiles&from_uid=339804.