Biology:C2orf16

From HandWiki
Revision as of 00:53, 10 February 2024 by DanMescoff (talk | contribs) (change)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Protein-coding gene in the species Homo sapiens


A representation of the 3D structure of the protein myoglobin showing turquoise α-helices.
Generic protein structure example

C2orf16 is a protein that in humans is encoded by the C2orf16 gene. Isoform 2 of this protein (NCBI ID: CAH18189.1[1] henceforth referred to as C2orf16) is 1,984 amino acids long.[2] The gene contains 1 exon and is located at 2p23.3.[3] Aliases for C2orf16 include Open Reading Frame 16 on Chromosome 2 and P-S-E-R-S-H-H-S Repeats Containing Sequence.[4]

68 orthologs are known for this gene, including in mice and sheep, but no paralogs have been found.[5]

Gene

The C2orf16 isoform 2 is a 6.2 kb, 1 exon gene at locus 2p23.3, and contains P-S-E-R-S-H-H-S repeats on the C-terminal side of the gene from amino acid 1,559 to 1,903. These repeats appear to have arisen from a transposable element. Primates show more P-S-E-R-S-H-H-S repeats than other mammalian orthologs do.[3]

Expression

C2orf16 is found to be highly expressed in the testes[6] and a retinoic acid and mitogen-treated human embryonic stem cell line,[7] but is not known to be expressed differently in age or disease phenotypes.[8] C2orf16 is also seen to have high expression in the pre-implantation embryo from the 4-cell embryo stage to the blastocyst stage.[9]

C2orf16 is not seen to have rapamycin sensitive expression.[10] C2orf16 is also seen to significantly increase expression in c-MYC knockdown breast cancer cells.[11]

mRNA

Isoforms

Two isoforms exist of C2orf16. Isoform 1 is 5,388 amino acids long encoded in 5 exons over 16,401 base pairs. Isoform 2 uses an alternate start site of transcription and is considerably shorter at 1,984 amino acids long encoded in 1 exon over 6,200 base pairs.[5]

Expression Regulation

One miRNA is predicted to bind to the 3'UTR of C2orf16, accession number MI0005564.[12][13]

Protein

C2orf16 has a predicted molecular weight of 224kD and a predicted isoelectric point of 10.08,[14] values that are relatively constant between orthologs. The protein includes higher than average composition of serine, histidine, and arginine and a lower than average composition of alanine.[15]

Compositional Features

A positive charge cluster is found from amino acid residues 1,274 to 1,302.[15]

An arginine rich region is found from amino acids 1,545 to 1,933, a serine rich region is found from amino acids 1,568 to 1,934, and a histidine rich region is found from amino acids 1,630 to 1,853.[15]

A dot matrix analysis[16] reveals a heavily repeated region from approximately residue 1,500 to 1,984, this being the P-S-E-R-S-H-H-S repeat. a small band of dots at approximately amino acid 1,200 denotes a half repeat of the P-S-E-R-S-H-H-S sequence.

Dot matrix analysis of uncharacterized protein C2orf16 isoform 2. The P-S-E-R-S-H-H-S repeat sequence is visualized via the darker area of the matrix from amino acid 1500–1984, and a half P-S-E-R-S-H-H-S repeat sequence is seen as a band near amino acid 1200.

C2orf16 isoform 2 has no transmembrane domains,[17] and is predicted to be localized to the nucleus after translation due to two nuclear localization sequences predicted at residues 1,233 and 1,281.[18] No nuclear export sequence is conserved amongst orthologs,[19] suggesting C2orf16 is not meant to leave the nucleus after import. No N- or C- terminal modifications were predicted.[20][21][22][23]

Sub-cellular Localization

C2orf16 is predicted to be localized to the nucleus after transcription.[5]

Structure

C2orf16 Isoform 2 predicted 3D structure showing the three major domains of the protein. Domain 3 contains the P-S-E-R-S-H-H-S repeat sequence.

The 3D structure of C2orf16 is predicted to have three major domains. Domain 1 is from amino acids 1 to 662, domain 2 is from amino acids 674 to 1,487, and domain 3 is from amino acids 1,488 to 1,984.[24] Domain 1 and 2 are predicted to be connected via a stretch of 12 amino acids not otherwise organized into a secondary structure allowing flexibility between domains 1 and 2. Domain 2 is predicted to have protein interacting domains for transcription factors.[24] Domain 3 is predicted to follow a "balls on a string" structure[24] and has many sites for possible phosphorylation.[25]

Protein Interactions

C2orf16 has been shown to have a physical interaction with proto-oncogene Myc by tandem affinity purification.[26]

Ortholog Phylogeny

68 orthologs are known for C2orf16.[5] The protein seems to have appeared in the mammalian evolutionary history 320 million years ago, around the divergence of mammals from reptiles. This history would explain why orthologs do not exist in amphibians, reptiles, birds, nor other more distantly related species.[27]

Any orthologs from species more distant from humans than other mammals are likely not related in function, however, the P-S-E-R-S-H-H-S repeat is present in bony fishes, crustaceans, stramenopiles including potato blight, plantae, and prokaryotes.[27]

The transposon repeat may have been reintroduced to mammals by a viral vector.

Repeat Sequence

P-S-E-R-S-H-H-S Repeat Sequence Logo

The P-S-E-R-S-H-H-S repeat sequence is seen to be conserved in orthologs for C2orf16, and is conserved in organisms as distantly related as oomycete slime mold[28] and plants including the chloroplasts of Ashby's Wattle.[29] The S-P-S-E-R portion of the repeat is seen to be the most important for conservation, as seen by alignment with these orthologs and by creation of a Logo.[30]

The conservation analysis of the repeat shows the initial S-P-S is highly conserved, possibly for phosphorylation(S) and structure(P), and the R is almost completely conserved, mutating to a Lysine in some orthologs,[29] implying the positive charge is necessary for the purpose of the repeat.

The 3D shape of the repeat sequence is unclear as it has been predicted to be either balls-on-a-string[31] or an antiparallel beta-sheet[3] structure.

Function

C2orf16 isoform 2 is predicted to have a possible function in mitosis regulation through its nuclear localization,[5][18] predicted transcription factor binding site,[24] physical association with Myc,[26] and increased expression in c-MYC knockdown breast cancer cells.[11]

Clinical Significance

There are four patents on record for C2orf16, one each involving: cancerous PPP2RIA and ARID1A mutations,[32] Alzheimer's predisposition,[33] viral vaccine diversity,[34] and copy number variation relation to common variable immunodeficiency.[35] C2orf16 is also shown to have increased expression in some breast cancer lines,[11] as well as being involved with Myc[26] which is a common oncogene, making C2orf16 a possible oncogene to target in cancer treatments.

References

  1. "hypothetical protein [Homo sapiens – Protein – NCBI"]. https://www.ncbi.nlm.nih.gov/protein/CAH18189.1?report=genbank&log$=protalign&blast_rank=2&RID=CNUFCEWG015. 
  2. "C2orf16 Gene". Weizmann Institute of Science, Life Map Sciences. https://www.genecards.org/cgi-bin/carddisp.pl?gene=C2orf16&keywords=C2orf16. 
  3. 3.0 3.1 3.2 "C2orf16 – Uncharacterized protein C2orf16 – Homo sapiens (Human) – C2orf16 gene & protein". https://www.uniprot.org/uniprot/Q68DN1. 
  4. "C2orf16 Gene". Weizmann Institute of Science, Life Map Sciences. https://www.genecards.org/cgi-bin/carddisp.pl?gene=C2orf16&keywords=C2orf16. 
  5. 5.0 5.1 5.2 5.3 5.4 "Ensembl 2018". Nucleic Acids Research 46 (D1): D754–D761. January 2018. doi:10.1093/nar/gkx1098. PMID 29155950. PMC 5753206. https://www.ebi.ac.uk/s4/jump?from=aHR0cDovL3d3dy5lYmkuYWMudWsvczQvc3VtbWFyeS9tb2xlY3VsYXI/dGVybT1DMm9yZjE2JmNsYXNzaWZpY2F0aW9uPTk2MDYmdGlkPW5hbWVPcmdFTlNHMDAwMDAyMjE4NDM=&hash=hash&url=https://www.ensembl.org/Homo_sapiens/Gene/Compara_Ortholog?db=core;g=ENSG00000221843;r=2:27537386-27582721. 
  6. "Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays". Science 302 (5653): 2141–4. December 2003. doi:10.1126/science.1090100. PMID 14684825. Bibcode2003Sci...302.2141J. 
  7. "Homo sapiens gene C2orf16, encoding chromosome 2 open reading frame 16.". NCBI National Institute of Health. https://www.ncbi.nlm.nih.gov/ieb/research/acembly/av.cgi?db=human&term=C2orf16&submit=Go. 
  8. "EST Profile – Hs.131021". https://www.ncbi.nlm.nih.gov/UniGene/ESTProfileViewer.cgi?uglist=Hs.131021. 
  9. "Rewirable gene regulatory networks in the preimplantation embryonic development of three mammalian species". Genome Research 20 (6): 804–15. June 2010. doi:10.1101/gr.100594.109. PMID 20219939. 
  10. "Akt inhibitors MK-2206 and nelfinavir overcome mTOR inhibitor resistance in diffuse large B-cell lymphoma". Clinical Cancer Research 18 (9): 2534–44. May 2012. doi:10.1158/1078-0432.CCR-11-1407. PMID 22338016. 
  11. 11.0 11.1 11.2 "Novel c-MYC target genes mediate differential effects on cell proliferation and migration". EMBO Reports 8 (1): 70–6. January 2007. doi:10.1038/sj.embor.7400849. PMID 17159920. 
  12. Izaurralde, Elisa, ed (August 2015). "Predicting effective microRNA target sites in mammalian mRNAs". eLife 4: e05005. doi:10.7554/eLife.05005. PMID 26267216. 
  13. "miRNA Entry for MI0005564". http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0005564. 
  14. "ExPASy – Compute pI/Mw tool". https://web.expasy.org/compute_pi/. 
  15. 15.0 15.1 15.2 "The EMBL-EBI search and sequence analysis tools APIs in 2019". Nucleic Acids Research 47 (W1): W636–W641. April 2019. doi:10.1093/nar/gkz268. PMID 30976793. 
  16. "EMBOSS: dotmatcher". http://www.bioinformatics.nl/cgi-bin/emboss/dotmatcher. 
  17. "On filtering false positive transmembrane protein predictions". Protein Engineering 15 (9): 745–52. September 2002. doi:10.1093/protein/15.9.745. PMID 12456873. 
  18. 18.0 18.1 "PROSITE, a protein domain database for functional characterization and annotation". Nucleic Acids Research 38 (Database issue): D161-6. January 2010. doi:10.1093/nar/gkp885. PMID 19858104. 
  19. "Analysis and prediction of leucine-rich nuclear export signals". Protein Engineering, Design & Selection 17 (6): 527–36. June 2004. doi:10.1093/protein/gzh062. PMID 15314210. 
  20. "NetAcet: prediction of N-terminal acetylation sites". Bioinformatics 21 (7): 1269–70. April 2005. doi:10.1093/bioinformatics/bti130. PMID 15539450. 
  21. "N-Terminal myristoylation predictions by ensembles of neural networks". Proteomics 4 (6): 1626–32. June 2004. doi:10.1002/pmic.200300783. PMID 15174132. 
  22. "Computational prediction of N-linked glycosylation incorporating structural properties and patterns". Bioinformatics 28 (17): 2249–55. September 2012. doi:10.1093/bioinformatics/bts426. PMID 22782545. 
  23. "NetCGlyc 1.0: prediction of mammalian C-mannosylation sites". Glycobiology 17 (8): 868–76. August 2007. doi:10.1093/glycob/cwm050. PMID 17494086. 
  24. 24.0 24.1 24.2 24.3 "I-TASSER server: new development for protein structure and function predictions". Nucleic Acids Research 43 (W1): W174-81. July 2015. doi:10.1093/nar/gkv342. PMID 25883148. 
  25. "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology 294 (5): 1351–62. December 1999. doi:10.1006/jmbi.1999.3310. PMID 10600390. 
  26. 26.0 26.1 26.2 "PSICQUIC View". http://www.ebi.ac.uk/Tools/webservices/psicquic/view/results.xhtml?conversationContext=1. 
  27. 27.0 27.1 Madden, Tom (2003-08-13) (in en). The BLAST Sequence Analysis Tool. National Center for Biotechnology Information (US). https://www.ncbi.nlm.nih.gov/books/NBK21097/. 
  28. "cyst germination specific acidic repeat protein precursor [Phytophthora infestans"]. https://www.ncbi.nlm.nih.gov/protein/AAC72308.1?report=fasta. 
  29. 29.0 29.1 "accD (chloroplast) [Acacia ashbyae – Protein – NCBI"]. https://www.ncbi.nlm.nih.gov/protein/CUR00717.1?report=genpept. 
  30. "WebLogo – Create Sequence Logos". https://weblogo.berkeley.edu/logo.cgi. 
  31. "I-TASSER server for protein 3D structure prediction". BMC Bioinformatics 9 (1): 40. January 2008. doi:10.1186/1471-2105-9-40. PMID 18215316. 
  32. Vogelstein, Bert; Kinzler, Kenneth W.; Velculescu, Victor; Papadopoulous, Nickolas; Jones, Sian (May 31, 2012). US Patent 9,982,304. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&p=1&f=G&l=50&d=PTXT&S1=C2orf16&OS=C2orf16&RS=C2orf16. Retrieved Feb 5, 2019. [yes|permanent dead link|dead link}}]
  33. Nagy, Zsuzsanna (Jan 16, 2014). US Patent 9,944,986. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=2&p=1&f=G&l=50&d=PTXT&S1=C2orf16&OS=C2orf16&RS=C2orf16. Retrieved Feb 5, 2019. [yes|permanent dead link|dead link}}]
  34. Shenk, Thomas; Wang, Dai (April 16, 2009). US Patent 9,439,960. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=3&p=1&f=G&l=50&d=PTXT&S1=C2orf16&OS=C2orf16&RS=C2orf16. Retrieved Feb 5, 2019. [yes|permanent dead link|dead link}}]
  35. Hakonarson, Hakon; Glessner, Joseph; Orange, Jordan (Nov 28, 2013). US Patent 9,109,254. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=4&p=1&f=G&l=50&d=PTXT&S1=C2orf16&OS=C2orf16&RS=C2orf16. Retrieved Feb 5, 2019. [yes|permanent dead link|dead link}}]