Biology:Chromosome 5 open reading frame 47

From HandWiki
Short description: Human C5ORF47 Gene


A representation of the 3D structure of the protein myoglobin showing turquoise α-helices.
Generic protein structure example


Chromosome 5 Open Reading Frame 47, or C5ORF47, is a protein which, in humans, is encoded by the C5ORF47 gene.[1] It also goes by the alias LOC133491.[2] The human C5ORF47 gene is primarily expressed in the testis.[1]

Gene

C5ORF47 is located at 5q35.2.[1] The full gene spans 16,911 nucleotides, and the mRNA transcript, made up of 5 exons, spans 2511 nucleotides.[3]

Gene expression

Human C5ORF47 is primarily expressed in the testis, as well as expressed in low levels in many tissues - stomach, lung, kidney, intestine, heart, and adrenal - in varying levels and times throughout fetal development.[1]

Transcript

The mRNA sequence of C5ORF47 is 2511 nucleotides long and consists of 5 exons and 6 introns[4]

Bird's eye view of human C5ORF47 promoter and gene. The promoter, start of transcription, and exons 1-5 are labeled.[3]

The human C5ORF47 gene has two known isoforms, the first (XP_016864517.1)[5] encodes a protein that is 176 amino acids in length, and the second (XP_011532733.1)[6] encodes a protein that is 150 amino acids in length. This article will primarily focus on the first, more common isoform.

Protein

The molecular weight of the unmodified precursor human C5ORF47 protein is 19.2 kDal, and the isoelectric point is predicted to be 10.49.[7]

The protein is basic and appears to have a relatively high concentration of positively charged amino acids, lysine and arginine, in comparison to negatively charged amino acids, aspartic acid and glutamic acid. The repetitive amino acid structure, “SQLR”, can be found in two locations in the protein.[8]

Domains

The human C5ORF47 protein contains DUF4680, a Domain of Unknown Function, that is characterized by two conserved amino acid sequence motifs: VISRM and ENE.[9]

Structure

Predicted tertiary structure of C5ORF47 protein. The 10 most conserved amino acids are labeled in yellow.[10]

Within the predicted tertiary structure of C5ORF47, the most conserved amino acids fall within the Domain of Unknown Function: DUF4680.

Cellular Localization

The human C5ORF47 protein is predicted to be localized in the nucleus.[11][12]

A stretch of five positively charged lysines, indicating a nuclear localization sequence, can be found at positions 133-137 of the amino acid sequence.[11]

Immunohistochemical staining of human testis from Sigma Aldrich shows moderate nuclear positivity in cells in seminiferous ducts of the testes.[13]

Post-Translational Modifications

Human C5ORF47 protein diagram depicting DUF4680 highlighted in blue, nuclear localization sequence highlighted in pink, conserved phosphorylation sites denoted by yellow circles, and conserved O-linked glycosylation sites denoted by green squares.[8][11][14][15][16]

Predicted phosphorylation sites T19, S23, S96, S129, S149, Y158, S168, S169, and O-linked glycosylation sites S96, S105, and S168 are conserved among most mammalian orthologs.[14][15]

Homology

Conceptual translation of C5ORF47 coding sequence.The exon boundaries are defined in blue. A domain of unknown function is identified with brackets. A disordered region is shown in gray. Poly A signals poly A sites are shown in yellow.  Ten most conserved amino acids are bolded. Eight most conserved phosphorylation sites are highlighted in gold. Repeat sequences are highlighted in green. FxxP domain is highlighted in purple. The Nuclear Localization Sequence is highlighted in pink. The underlined segments refer to the two conserved amino acid sequence motifs that characterize DUF4680.[17]

Orthologs of the human C5ORF47 gene can be found in mammals, birds, and reptiles, but not in amphibians, fish, or invertebrates. No paralogs of the human C5ORF47 gene are known.[18][19][20]

Taxonomic Order Genus and Species Common Name Median Date of Divergence (MYA) Sequence Length Sequence Identity to Human Protein (%) Sequence Similarity to Human Protein (%)
Mammalia Primates Homo sapiens Human 0 176 100 100
Primates Gorilla gorilla Gorilla 8.6 176 97.7 98
Rodentia Mus musculus House Mouse 87 165 50 63.1
Chiroptera Pteropus giganteus Indian flying fox 94 176 58 64.9
Carnivora Neomonachus schauinslandi Hawaiian monk seal 94 178 56.7 63.9
Aves Passeriforme Onychostruthus taczanowskii White-rumped snowfinch 319 173 33.2 42.8
Casuariiforme Dromaius novaehollandiae Emu 319 182 32 44.5
Passeriformes Taeniopygia guttata Zebra Finch 319 177 30.8 39.3
Apterygiforme Apteryx rowi Okarito Kiwi 319 180 30.5 41
Caprimulgiformes Antrostomus carolinensis Chuck-will's-widow 319 229 26.8 37
Apodiformes Calypte anna Anna's humming bird 319 228 24 35.6
Galliformes Phasianus colchicus Ring-necked Pheasant 319 263 20.5 30.9
Reptilia Squamata Zootoca vivipara Viparous Lizard 319 205 30 42.3
Squamata Podarcis muralis Common wall lizard 319 243 28.9 39.9
Testudines Dermochelys coriacea Leatherback sea turtle 319 242 27.6 41.3
Testudines Gopherus flavomarginatus Bolson Tortoise 319 235 26.6 38.1
Testudines Chelonoidis abingdonii Pinta Island tortoise 319 234 26.6 35.5
Squamata Sceloporus undulatus Eastern fence lizard 319 241 24.7 34.9
Crocodilia Alligator mississippiensis American Alligator 319 307 22.8 32.2

Interacting Proteins

Proteins that are predicted to interact with the human C5ORF47 protein tend have characteristics such as testes-specific, pertaining to sperm or spermatogenesis, or related to cilia/flagella formation.[21][22]

Interacting Protein Full Name Cellular Compartment Function
CCDC185 Coiled-coil domain-containing protein 185 Cellular localization unknown Has a role in ciliogenesis (by similarity). Required for proper cephalic and left/right axis development[23]
C10orf120 Uncharacterized protein C10orf120 Cellular localization unknown Diseases associated with C10orf120 include Vas Deferens, Congenital Bilateral Aplasia which occurs in males when the tubes that carry sperm out of the testes (the vas deferens) fail to develop properly.[24][25]
C4orf22 Uncharacterized protein C4orf22 Predicted to be located in cytoplasm.[26] Cilia and flagella associated protein.[26]
TGIF2LX Tgfb induced factor homeobox 2 like, x-linked; Homeobox protein TGIF2LX Predicted to be located in the nucleus[27] May have a transcription role in testis. Testis-specific expression suggests that this gene may play a role in spermatogenesis.[27]
ZPLD1 Zona pellucida-like domain-containing protein 1
  • Predicted to be an extracellular matrix structural constituent.
  • Predicted to be located in cytoplasmic vesicle membrane.
  • Predicted to be integral component of membrane.
  • Predicted to be active in cell surface and extracellular space[28]
Glycoprotein which is a component of the gelatinous extracellular matrix in the cupulae of the vestibular organ[28]
SPERT Spermatid-associated protein Predicted to be located in cytoplasmic vesicle.[29] Enables identical protein binding activity.[29]
ZNF606 Zinc finger protein 606 Predicted to be located in the nucleus[30] Nuclear protein that can act as a transcriptional repressor of growth factor-mediated signaling pathways. Reduced expression of this gene promotes chondrocyte differentiation[30]
C3orf20 Uncharacterized protein C3orf20 Predicted to be located in cytoplasm.[31] Unknown function
C14orf119 Uncharacterized protein C14orf119 Located in cytosol and mitochondria.[32] Unknown function

Clinical Significance

In a study conducted to identify rare genetic variants contributing to Neuromyelitis optica in Finland, Four missense variants were shared by two patients in C3ORF20, PDZD2, C5ORF47 and ZNF606.[33]

Microarray data shows that human C5ORF47 expression is low in an individual with teratozoospermia, which is characterized by the presence of spermatozoa with abnormal morphology over 85% in sperm.[34][35]

Microarray data shows that human C5ORF47 expression is lower in p63 depleted cells.[36] The p63 protein functions as a transcription factor that helps regulate numerous cell activities, including cell proliferation, cell maintenance, differentiation, cell adhesion, and apoptosis. The p63 protein also plays a critical role in the formation of ectodermal structures in early development. Studies suggest that it also plays essential roles in the development of the limbs, facial features, urinary system, and other organs and tissues.[37]

References

  1. 1.0 1.1 1.2 1.3 "C5orf47 chromosome 5 open reading frame 47 [Homo sapiens (human) - Gene - NCBI"]. https://www.ncbi.nlm.nih.gov/gene/133491. 
  2. "AceView: Gene:C5orf47, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView.". https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=human&term=C5ORF47&submit=Go. 
  3. 3.0 3.1 "User Sequence vs Genomic". https://genome.ucsc.edu/cgi-bin/hgc?o=173989169&g=htcUserAli&i=../trash/hgSs/hgSs_genome_2b5c1_e5dd30.pslx+../trash/hgSs/hgSs_genome_2b5c1_e5dd30.fa+YourSeq&c=chr5&l=173989169&r=174006140&db=hg38&hgsid=1510388993_lRMfFMDn7akRNi3MnscMdBSFH9o5. 
  4. "Homo sapiens chromosome 5 open reading frame 47 (C5orf47), mRNA" (in en-US). U.S. National Library of Medicine. 2022-08-13. http://www.ncbi.nlm.nih.gov/nuccore/NM_001144954.2. 
  5. "uncharacterized protein C5orf47 isoform X1 [Homo sapiens - Protein - NCBI"]. https://www.ncbi.nlm.nih.gov/protein/XP_016864517.1. 
  6. "uncharacterized protein C5orf47 isoform X2 [Homo sapiens - Protein - NCBI"]. https://www.ncbi.nlm.nih.gov/protein/XP_011532733.1. 
  7. "Expasy - Compute pI/Mw tool". https://web.expasy.org/compute_pi/. 
  8. 8.0 8.1 "SAPS < Sequence Statistics < EMBL-EBI". https://www.ebi.ac.uk/Tools/seqstats/saps/. 
  9. "MOTIF: Searching Protein Sequence Motifs". https://www.genome.jp/tools/motif/. 
  10. "AlphaFold Protein Structure Database". https://alphafold.ebi.ac.uk/entry/Q569G3. 
  11. 11.0 11.1 11.2 "PSORT II Prediction". https://psort.hgc.jp/form2.html. 
  12. "DeepLoc2.0" (in en). 2022-12-15. https://services.healthtech.dtu.dk/service.php?DeepLoc-2.0. 
  13. SigmaAldrich (2022-12-15). "Anti-C5orf47 antibody produced in rabbit". https://www.sigmaaldrich.com/US/en/product/sigma/hpa048716. 
  14. 14.0 14.1 "C5orf47 (human)". https://www.phosphosite.org/proteinAction.action?id=3508849&showAllSites=true. 
  15. 15.0 15.1 "Services" (in en). https://services.healthtech.dtu.dk/. 
  16. "IBS - Online". http://ibs.biocuckoo.org/online.php#. 
  17. "Six-Frame Translation". https://www.bioline.com/media/calculator/01_13.html. 
  18. "BLAST: Basic Local Alignment Search Tool". https://blast.ncbi.nlm.nih.gov/Blast.cgi. 
  19. "TimeTree :: The Timescale of Life" (in en). http://timetree.org/. 
  20. "EMBOSS Needle". 2022-12-15. https://www.ebi.ac.uk/Tools/psa/emboss_needle/. 
  21. "C5orf47 protein (human) - STRING interaction network". https://string-db.org/network/9606.ENSP00000340887. 
  22. "Germ-Cell-Specific Inflammasome Component NLRP14 Negatively Regulates Cytosolic Nucleic Acid Sensing to Promote Fertilization". Immunity 46 (4): 621–634. April 2017. doi:10.1016/j.immuni.2017.03.020. PMID 28423339. 
  23. "UniProt". https://www.uniprot.org/uniprotkb/Q9BV29/entry. 
  24. "C10orf120 Gene - GeneCards | CJ120 Protein | CJ120 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=C10orf120. 
  25. "Congenital bilateral absence of the vas deferens - About the Disease - Genetic and Rare Diseases Information Center" (in en). https://rarediseases.info.nih.gov/diseases/5461/congenital-bilateral-absence-of-the-vas-deferens. 
  26. 26.0 26.1 "CFAP299 Gene - GeneCards | CF299 Protein | CF299 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=CFAP299. 
  27. 27.0 27.1 "TGIF2LX Gene - GeneCards | TF2LX Protein | TF2LX Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=TGIF2LX. 
  28. 28.0 28.1 "ZPLD1 Gene - GeneCards | ZPLD1 Protein | ZPLD1 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=ZPLD1. 
  29. 29.0 29.1 "CBY2 Gene - GeneCards | CBY2 Protein | CBY2 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=CBY2. 
  30. 30.0 30.1 "ZNF606 Gene - GeneCards | ZN606 Protein | ZN606 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=ZNF606. 
  31. "C3orf20 Gene - GeneCards | CC020 Protein | CC020 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=C3orf20. 
  32. "C14orf119 Gene - GeneCards | CN119 Protein | CN119 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=C14orf119. 
  33. "Exome and regulatory element sequencing of neuromyelitis optica patients". Journal of Neuroimmunology 289: 139–142. December 2015. doi:10.1016/j.jneuroim.2015.11.002. PMID 26616883. 
  34. "GDS2697 / 1557057_a_at". https://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS2697:1557057_a_at. 
  35. "Genetic aspects of monomorphic teratozoospermia: a review". Journal of Assisted Reproduction and Genetics 32 (4): 615–623. April 2015. doi:10.1007/s10815-015-0433-2. PMID 25711835. 
  36. "GDS2534 / 1557056_at". https://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS2534:1557056_at. 
  37. "TP63 gene: MedlinePlus Genetics" (in en). https://medlineplus.gov/genetics/gene/tp63/.