Biology:C22orf23

From HandWiki
Short description: Protein-coding gene in the species Homo sapiens


A representation of the 3D structure of the protein myoglobin showing turquoise α-helices.
Generic protein structure example


C22orf23 (Chromosome 22 Open Reading Frame 23) is a protein which in humans is encoded by the C22orf23 gene. Its predicted secondary structure consists of alpha helices and disordered/coil regions. It is expressed in many tissues and highest in the testes and it is conserved across many orthologs.

Gene

This conceptual translation includes post-translational modifications highlighted in different colors that correspond with the key indicating the type of modification. The reference sequence for C22orf23, NM 032561.4, was conceptually translated and aligned with the predicted peptide, with the use of Bioline. The start codon is highlighted in green, the stop codon is highlighted in red, and the 6 exon-exon junctions are highlighted in light blue. The polyadenylation tail is highlighted in orange, and highly conserved amino acids are highlighted in purple.
This conceptual translation of c22orf23 includes post-translational modifications indicated on the right, start codon in green, stop codon in red, exon-exon junctions in blue, polyadenylation tail in orange, and highly conserved amino acids in purple.

Size and locus

C22orf23 is a gene found in homo sapiens. It is located on Chromosome 22 on the minus strand, map position 22q13.1. It spans 10,620 base pairs.[1][2] Its mRNA transcript is 1988 base pars long and has 7 exons.[3] Its predicted function is protein binding, and molecular function.[1]

Common aliases

C22orf23's aliases are: UPF0193 Protein EVG1, DJ1039K5.6, EVG1[4] FLJ32787, and LOC84645.[5]

Protein

Primary sequence

The protein encoded by the mRNA sequence is 217 amino acids in length[2] and has a predicted molecular mass of 25 kDa.[4][6] The predicted isoelectric point is 9.8.[7] It is located in the nucleus.[6]

Domains and motifs

It is predicted to be an intracellular protein[6] and does not have any predicted transmembrane domains.[6][8][9] Due to its location and lack of predicted transmembrane domains, the protein structure is likely a globular protein.

Post-Translational Modifications

Predicted secondary structure of C22orf23. 53% of the sequence is predicted as disordered and cannot be predicted with confidence. It has a coverage modeled of 28% with a 42.9% confidence. The image was created using Phyre2.[10]

C22orf23 has many predicted post-translational modifications such as: phosphorylation sites,[11] cell attachment sequences, N-myristoylation sites,[12] O-linked glycosylation sites,[13] glycation sites,[14] Ac-ASQK cleaved-acetylated sites, and Sumoylation sites.[15][16] Many of the predicted phosphorylation sites were also predicted to be O-linked glycosylation sites thus the phosphorylation site could be blocked altering that domain's structure or function.

Secondary Structure

The predicted secondary structure consists of alpha helices and disordered/coil regions.[10][17][18][19][20] The predicted secondary structure model has a 28% coverage of the amino acid sequence with a 42.9% confidence.[10]

Homology

Paralogs

There are currently no known paralogs to C22orf23.[21][22][23]

Orthologs

Orthologs can be found in most major groups of species ranging from most similar in primates to most distant in a member of phylum Chytridiomycota. This includes: mammals, reptiles, birds, amphibian, bony fish, cartilaginous fish, invertebrates, and fungi. Orthologs may have first appeared in plants or fungi however it is uncertain.[22]

This table lists several orthologs for C22orf23 and includes their species name, common name, taxonomic order, accession number, sequence length, sequence similarity,[22] and evolutionary date of divergence.[24]

Orthologs of C22orf23
Genus and Species Common Name Taxonomic Group (Order) Date of Divergence (MYA) Accession # Sequence Length (aa) Sequence Identity (%)
Homo sapiens Human Primate 0 AAH31998.1 217 100
Piliocolobus tephrosceles Ugandan Red Colobus Primate 29 XP_023077818 217 95
Propithecus coquereli Coquerel's Sifaka Primate 74 XP_012493592 217 84
Marmota marmota marmota Alpine Marmot Rodent 90 XP_015338208 217 73
Mus caroli Ryukyu Mouse Rodent 90 XP_021038824 216 81
Physeter catodon Sperm Whale Even-toed ungulates 96 XP_007111804 217 85
Odocoileus virginianus texanus White Tailed Deer Artiodactyla 96 XP_020752151 217 86
Panthera pardus Leopard Carnivores 96 XP_019302406 217 85
Rousettus aegyptiacus Egyptian Fruit Bat Bat 96 XP_016017249 217 83
Condylura cristata Star-nosed Mole Eulipotyphla 96 XP_004676507 217 80
Vombatus ursinus Common Wombat Diprotodontia 159 XP_027727589 263 61
Nothoprocta perdicaria Chilean Tinamou Tinamiformes 312 XP_025895660 234 61
Serinus canaria Atlantic Canary Passeriformes 312 XP_009084739 223 57
Notechis scutatus Tiger Snake Squamata 312 XP_026550684 234 56
Nanorana parkeri High Himalaya Frog Frog 352 XP_018428081 225 51
Salvelinus alpinus Arctic char Salmoniformes 435 XP_023998646 217 49
Rhincodon typus Whale Shark Carpet shark 473 XP_020370272 232 48
Callorhinchus milii Australian ghostshark Chimaera 473 NP_001279734 232 46
Apostichopus japonicus Sea Cucumber Synallactida 684 PIK47438 221 48
Crassostrea virginica Eastern Oyster Ostreoida 797 XP_022313321.1 224 43
Capitella teleta Segmented Annelid Worm Capitellidae 797 ELU02060 221 39
Megachile rotundata Leafcutter Bee Hymenopterans 797 XP_003702438 230 36
Stylophora pistillata Hood Coral Stony corals 824 XP_022780055 219 42
Pocillopora damicornis Cauliflower Coral Scleractinia 824 XP_027046963 220 42
Macrostomum lignano Flatworm Macrostomida 824 PAA47644 270 38
Trichoplax Trichoplax Tricoplaciformes 948 RDD45244 239 39
Spizellomyces punctatus Fungi Spizellomyces punctatus 1105 XP_016608264[25] 260 30

Expression

Promoter

The core promoter is GXP_7541220 (-), and its coordinates are 37953445-37954669 and it is 1225 base pairs long.[26]

Tissue expression

Human expression

Protein expression is highest in the testes however it is also expressed at low levels in many other tissues such as: brain, kidney, stomach, skin,[27] thyroid, urinary bladder, placenta, endometrium, esophagus, and appendix, bone marrow, adipose, lung,[28] and ovary.[29]

Ortholog expression

Expression in orthologs Rattus norvegicus, is expressed primarily in the testes with low levels of expression in the: kidneys, lungs, heart, and uterus.[30] Mus musculus is expressed primarily in the adrenal and testes, and also notably expressed in the: bladder, abdomen, heart, lungs, ovaries, and mammary gland.[31]

Interactions

Protein Interactions

There are several predicted protein interactions: Cyclin-D1-binding protein 1 which may regulate cell cycle progression, Vacuolar protein sorting-associated protein 28 homolog which is involved as a regulator of vesicular trafficking, UPF0739 protein C1orf74, and estrogen related receptor gamma. These interacting proteins were identified as either having direct interactions or physical associations. They were identified through a variety of detection methods including affinity chromatography, 2 hybrid prey pooling, and 2 hybrid array.[32][33][34] It also has predicted protein interactions with SH3 domain containing 19, EvC ciliary complex subunit 1, RIMS binding protein 3B, RIMS binding protein 3C,TSSK6-activating co-chaperone protein, V-set and immunoglobulin domain containing 8, family with sequence similarity 124 member B, small nucleolar RNA host gene 28, and transmembrane protein 200B. Evidence suggesting a functional link for these interactions were supported through Co-mention on PubMed.[32][35]

Clinical Significance

Disease Association

C22orf23 was identified as belonging to one of two groups of pooled serum samples in a study that analyzed the difference between serum glycoproteins of hepatocellular carcinoma and that of normal serum.[36] Deletions of parts of C22orf23 (exons 3 and 4) and several other genes including SOX10 has been observed in patients with peripheral demyelinating neuropathy, central demyelinating leukodystrophy, Waardenburg Syndrome, and Hirschsprung disease and is therefore, suggested to be a potential factor involved in these ailments.[37][38] C22orf23 was also mentioned in a study of mutation profiles from ER+ breast cancer samples taken from postmenopausal patience. There were mutations found that affected C22orf23 among many other genes.[39] In a study of epigenetic alterations involved in coronary artery disease, C22orf23 was found to have altered epigenetic modifications which could be involved in novel genes in Coronary artery disease.[40] In a study that attempts to predict imprinted genes that maybe linked to Human disorders, C22orf23 was identified as homologous of imprinted Gene candidates showing linkage to schizophrenia.[41] In another study it was listed as being a potently regulated protein in uterine leiomyoma.[42]

Mutations

There are a total of 3340 SNPs within the 5’ and 3’ UTR, introns, exons, as well as some genes near the 5’ and 3’ UTR. There is a total of 225 SNPs within the coding sequence. Some of the SNPs occur in conserved amino acids within the coding sequence and some reported have one or more types of validation. Some of the SNPs have high heterozygosity scores and thus have a presence in the population.[43]

References

  1. 1.0 1.1 "C22orf23 Gene(Protein Coding)". 2019-05-05. https://www.genecards.org/cgi-bin/carddisp.pl?gene=C22orf23&keywords=c22orf23. [|permanent dead link|dead link}}]
  2. 2.0 2.1 "Chromosome 22 open reading frame 23 (C22orf23)". https://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=517612. 
  3. "Homo sapiens chromosome 22 open reading frame 23 (C22orf23), transcript variant 1, mRNA". https://www.ncbi.nlm.nih.gov/nuccore/NM_032561.4. 
  4. 4.0 4.1 "C22orf23 Gene". https://www.genecards.org/cgi-bin/carddisp.pl?gene=C22orf23. 
  5. "WikiGenes - Collaborative Publishing". http://www.wikigenes.org/. 
  6. 6.0 6.1 6.2 6.3 "Cell atlas - C22orf23 - The Human Protein Atlas". https://www.proteinatlas.org/ENSG00000128346-C22orf23/cell#rna. 
  7. "ExPASy - Compute pI/Mw tool". https://web.expasy.org/compute_pi/. 
  8. "TMpred - Prediction of Transmembrane Regions and Orientation". 2019-05-05. https://embnet.vital-it.ch/software/TMPRED_form.html. 
  9. "SAPS - Statistical Analysis of Protein Sequences". 2019-05-05. https://www.ebi.ac.uk/Tools/seqstats/saps/. 
  10. 10.0 10.1 10.2 "PHYRE2 Protein Fold Recognition Server". http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index. 
  11. "NetPhos 3.1 Server". 2019-05-05. http://www.cbs.dtu.dk/services/NetPhos/. 
  12. "Motif Scan". 2019-05-05. https://myhits.isb-sib.ch/cgi-bin/motif_scan. 
  13. "NetOGlyc 4.0 Server". 2019-05-05. http://www.cbs.dtu.dk/services/NetOGlyc/. 
  14. "NetGlycate 1.0 Server". 2019-05-05. http://www.cbs.dtu.dk/services/NetGlycate/. 
  15. "GPS SUMO - prediction of SUMOylation & SUMO- binding Motifs". 2019-05-05. http://sumosp.biocuckoo.org/. 
  16. "SUMOplot™ Analysis Program". 2019-05-05. https://www.abgent.com/sumoplot. 
  17. "CFSSP - Chou & Fasman secondary structure prediction server". 2019-05-04. http://www.biogem.org/tool/chou-fasman/. 
  18. "A Protein Secondary Structure Prediction Server". 2019-05-04. http://www.compbio.dundee.ac.uk/jpred/. 
  19. "SOPMA SECONDARY STRUCTURE PREDICTION METHOD". 2019-05-04. https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html. 
  20. "GOR IV SECONDARY STRUCTURE PREDICTION METHOD". 2019-05-04. https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_gor4.html. 
  21. "BLAT Search Genome". 2019-05-05. https://genome.ucsc.edu/cgi-bin/hgBlat. 
  22. 22.0 22.1 22.2 "Standard Protein BLAST". 2019-05-05. https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome. 
  23. "HomoloGene". 2019-05-05. https://www.ncbi.nlm.nih.gov/homologene/?term=c22orf23. 
  24. "TimeTree of Life". 2019-05-05. http://www.timetree.org/. 
  25. "Genome Sequence of Spizellomyces punctatus". Genome Announcements 4 (4): e00849-16. August 2016. doi:10.1128/genomeA.00849-16. PMID 27540072. 
  26. "Genomatix". 2019-05-05. https://www.genomatix.de/cgi-bin/eldorado/eldorado.pl?s=2eb513043fe3892d2d7475a1da5c3c15;RESULT=C22orf23_1. 
  27. "EST Profile - Hs.517612 - C22orf23". 2019-05-05. https://www.ncbi.nlm.nih.gov/UniGene/ESTProfileViewer.cgi?uglist=Hs.517612. [|permanent dead link|dead link}}]
  28. "C22orf23 chromosome 22 open reading frame 23 [ Homo sapiens (human) "]. 2019-05-05. https://www.ncbi.nlm.nih.gov/gene/84645. 
  29. "C22orf23". 2019-05-05. https://www.proteinatlas.org/ENSG00000128346-C22orf23/tissue. 
  30. "RGD1359634 similar to RIKEN cDNA 1700088E04 [ Rattus norvegicus (Norway rat) "]. 2019-05-05. https://www.ncbi.nlm.nih.gov/gene/315126. 
  31. "1700088E04Rik RIKEN cDNA 1700088E04 gene [ Mus musculus (house mouse) "]. 2019-05-05. https://www.ncbi.nlm.nih.gov/gene/27660. 
  32. 32.0 32.1 "NCBI". 2019-05-05. https://www.ncbi.nlm.nih.gov/. 
  33. "PSICQUIC View". 2019-05-05. http://www.ebi.ac.uk/Tools/webservices/psicquic/view/clustered.xhtml?conversationContext=1. 
  34. "Mentha". 2019-05-05. http://mentha.uniroma2.it/result.php?q=P62508&org=9606. 
  35. "String". 2019-05-05. https://string-db.org/cgi/network.pl?taskId=aBKybAnPxQ3K. 
  36. "Quantitative proteomic analysis for high-throughput screening of differential glycoproteins in hepatocellular carcinoma serum". Cancer Biology & Medicine 12 (3): 246–54. September 2015. doi:10.7497/j.issn.2095-3941.2015.0010. PMID 26487969. 
  37. "Heterozygous deletion at the SOX10 gene locus in two patients from a Chinese family with Waardenburg syndrome type II". International Journal of Pediatric Otorhinolaryngology 79 (10): 1718–21. October 2015. doi:10.1016/j.ijporl.2015.07.034. PMID 26296878. 
  38. "Deletions at the SOX10 gene locus cause Waardenburg syndrome types 2 and 4". American Journal of Human Genetics 81 (6): 1169–85. December 2007. doi:10.1086/522090. PMID 17999358. 
  39. Pascal, Gellert (December 2014). "Abstract S1-04: Exome sequencing of post-menopausal ER+ breast cancer (BC) treated pre-surgically with aromatase inhibitors (AIs) in the POETIC trial (CRUK/07/015". Cancer Research 75 (9 Supplement): S1-04. doi:10.1158/1538-7445.SABCS14-S1-04. 
  40. "Genome wide DNA methylation profiling for epigenetic alteration in coronary artery disease patients". Gene 541 (1): 31–40. May 2014. doi:10.1016/j.gene.2014.02.034. PMID 24582973. 
  41. "Genome-wide prediction of imprinted murine genes". Genome Research 15 (6): 875–84. June 2005. doi:10.1101/gr.3303505. PMID 15930497. 
  42. "Targeted cellular process profiling approach for uterine leiomyoma using cDNA microarray, proteomics and gene ontology analysis". International Journal of Experimental Pathology 84 (6): 267–79. December 2003. doi:10.1111/j.0959-9673.2003.00362.x. PMID 14748746. 
  43. "dbSNP Short Genetic Variations". 2019-05-05. https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?geneId=84645&ctg=NT_011520.13&mrna=NM_032561.4&prot=NP_115950.3&orien=reverse.