Biology:CXorf66

From HandWiki
Revision as of 18:53, 13 February 2024 by WikiG (talk | contribs) (url)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Human protein


A representation of the 3D structure of the protein myoglobin showing turquoise α-helices.
Generic protein structure example


CXorf66 also known as Chromosome X Open Reading Frame 66, is a 361aa protein in humans that is encoded by the CXorf66 gene. The protein encoded is predicted to be a type 1 transmembrane protein; however, its exact function is currently unknown.[1]

There is a patent for CXorf66 under the file US 8586006 by the Institute for Systems Biology and Integrated Diagnostics, Inc.[2]

CXorf66 protein is a potential novel cancer biomarker.[3]

Gene

CXorf66 is located on Chromosome X at Xq27.1 and is on the complement strand.[4] The CXorf66 gene is located between ATP11C ATPase, MIR505, and HNRNPA3P3.[4] In addition to this, according to OMIM, CXorf66 is positioned between SOX3, SPANXB1, and CDR1.[5]

Gene locus for human gene CXorf66 on Chromosome X

mRNA

Splice variants

CXorf66 only consists of one known splice variant with three exons (1-117, 118-271, and 272-1288bp) and two introns.[6] Locations of junctions occur at 30aa [G] and 81aa [M].[6]

Splicing of CXorf66 gene

CXorf66 has only been found to have only one polyadenylation site.[7]

Protein

Composition

With 57 serines and 42 lysines, the CXorf66 protein is both serine and lysine rich.[8] CXorf66 has a molecular weight of 39.9kdal and an isoelectric point of 9.89.[8]

Domains

CXorf66 protein has a predicted signal peptide from 1-19aa, a topological domain from 20-47aa, a transmembrane domain from 48-68aa, and a second topological domain from 69-361aa.[9] A signal peptide cleavage site is predicted to occur between the 17-18aa.[10] Upon analyzing the protein's composition (serine and lysine rich) and post-translational modifications (high levels of phosphorylation), it is predicted that the first topological domain [20-47aa] is extracellular, while the topological domain [69-361aa] is cytoplasmic. A visual can be seen in Figure II.[11]

Figure I. Candidate phosphorylation sites in human CXorf66 protein
Figure II. SOSUI Prediction of CXorf66 protein transmembrane topology

Three repeat motifs of DKPV [31-34 and 204-207aa], SEAK [97-100 and 287-290aa], and PKRS [161-164 and 245-248aa] have been found in the human CXorf66 protein. These repeats are conserved in other primates like Gorilla gorilla gorilla and Macaca mulatta, but are not present in other mammals.[12]

SNPs

There is one natural variant of the population (frequency 0.436) at 233aa from proline to leucine in the CXorf66 protein, with proline being the ancestral encoded amino acid. No effects have been observed with this missense mutation.[9][13]

Interacting proteins

Based on STRING's predicted protein interaction, CXorf66 has medium level scoring for being tied to the proteins listed in Figure III.[14] It is important to note that all proteins listed are not experimentally determined.

Regulation

Transcription

Promoter

There is only one known promoter predicted by Genomatix for the CXorf66 protein on the negative strand from 139047554-139048298 that is 745bp in length.[15] When BLAT Search Alignment was used for the CXorf66 promoter generated, numerous hits with high identity were retrieved for various genes on different chromosomes. The following are a few generated top scoring search results that share a high percent identity:[16]

Name Gene ID Score Span (bp) out of 745 Identity Chromosome Strand Start End
ZBTB8A 653121 282 656 88.2% 1 - 32994892 32995547
TESK2 10420 263 624 90.3% 1 - 45843093 45843716
TBCK 93627 244 639 91.5% 4 + 107146630 107147268
USP48 84196 241 631 89.0% 1 + 22014725 22015355
PTPN22 26191 227 281 90.0% 1 - 114365307 114365587
PSPH 5723 220 605 90.6% 7 - 56098319 56098923

Uniquely, TESK2 is a testis-specific protein kinase, which correlates with predicted CXorf66 tissue expression.

Transcription factors

Through the use of Genomatix, a table was generated of the top 20 transcription factors and their binding sites in the CXorf66 promoter (see Figure IV).[15]

Figure IV: Generated list of transcription factor binding sites in CXorf66 promoter

Translation

CXorf66 has two miRNAs, hsa-mir-1290 and hsa-miR-4446-5p predicted to bind to the 3' UTR region of the mRNA.[17]

Post-translational modifications

An N-glycosylation site has been predicted by Expasy's NetNGlyc at NGSS [24aa] with a secondary site also possible at NGTN [21aa].[18] Utilizing NetPhos, a total of 48 phosphorylation sites have been predicted (41 Serines, 2 Threonines, and 5 Tyrosines), all of which occur after the predicted transmembrane domain, suggesting cytoplasmic topology.[19] Using YinOYang, many O-GlcNAc sites have been predicted. All that include high potential occur after the 48-68aa transmembrane region.[20] A SUMOplot Analysis conducted of Homo sapiens CXorf66 protein, discovered a high probability of a sumoylation motif at position K241, alongside low probability motifs at K316 and K186. With sumoylation having a role in various cellular processes like nuclear-cytosolic transport and transcriptional regulation, it is expected CXorf66 is modified by a SUMO protein post-translation.[21]

Subcellular localization

Figure V. CXorf66 Nuclear Localization Signals Across Homologs

Using PSORT II, there is a nuclear localization signal of PYKKKHL at 268aa.[22] This signal can be seen to be conserved in fellow primate species; however, is not present in other mammals. In addition to this, following SDSC's Biology Workbench's SAPS kNN-Prediction, the CXorf66 protein for humans and the mouse homolog have a 47.8% likelihood to end up in the nuclear region of a cell. For more distant homologs, like Bos taurus, that do not have nuclear localization signals however, CXorf66 has a 34.8% likelihood to end up in the extracellular, including cell wall region, or plasma membrane regions.[8][22] To view several homologs and their nuclear localization signals, see Figure V.

Homology

CXorf66 has no known paralogs in humans; however CXorf66 has conserved homologs throughout the Mammalia kingdom. Highly conserved in primates, a noticeable rapid evolution has been spotted for CXorf66, see Figure VI, explaining the greater number of orthologs in mammals, rather than in invertebrates, birds, and reptiles.[23]

Figure VI. CXorf66 Protein Homolog Divergence
CXorf66 Protein Species Date of divergence (MYA) [24] ncbi accession Number query cover E value Identity
CXorf66 homolog Chimpanzee (Pan troglodytes) 6.3 XP_001139133.1 100% 0 98%
CXorf66 homolog Gorilla (Gorilla gorilla gorilla) 8.8 XP_004065002.1 100% 0 98%
LOC631784 isoform X1 Mouse (Mus musculus) 92.3 XP_006528296.1 98% 2E-41 34%
CXorf66-like isoform X1 Rat (Rattus norvegicus) 92.3 XP_001068529.2 84% 6E-32 32%
CXorf66 homolog Cow (Bos taurus) 94.2 XP_005200949.1 96% 2.00E-46 35%
CXorf66 homolog White rhino (Ceratotherium simum simum) 94.2 XP_004441715.1 100% 8.00E-86 48%
CXorf66 homolog Horse (Equus caallus) 94.2 XP_005614614.1 96% 8.00E-58 44%
Neurofilament medium polypeptide Zebra finch (Taeniopygia guttata) 296 XP_002197538.1 44% 2.00E-08 30%
Triadin-like, partial Alligator (Alligator mississippiensis) 296 XP_006271227.1 53% 2.00E-12 23%
LOC590028 Sea urchin (Strongylocentrotus purpuratus) 742.9 XP_794743.3 45% 2.00E-05 35.40%
Alpha-L-fucosidase Streptococcus mitis 2535.8 WP_001083113.1 47% 7.00E-38 22%

Expression

From Unigene's EST cDNA Tissue Abundance display and Protein Atlas, CXorf66 has a moderately high expression levels in testes, in addition to higher expression levels in fetus tissue in comparison to other developmental stages.[25][26] CXorf66 protein also has a notable low presence in both the control endometrium total RNA and endometriosis total RNA.[27] CXorf66 has been portrayed to have notable presence in the plasma and platelet.[1] Based upon PaxDb data, CXorf66 has been found ranking in the top 5% for one study of human plasma and in the top 25% for another study conducted with human platelet.[28] In addition to this, there has been a noticeable 60–100% CXorf66 protein presence in both non-failing and dilated cardiomyopathy septum tissue.[29] Furthermore, CXorf66 has a ~75% protein presence in peripheral blood mononuclear cells.[30]

Unigene EST Tissue Expression Data for Human CXorf66 Protein
GeneCards Predicted Tissue Expression of Human CXorf66 Protein

References

  1. 1.0 1.1 GeneCard for CXorf66
  2. "Organ-specific proteins and methods of their use". US8586006 Patent. Institute For Systems Biology, Integrated Diagnostics, Inc.. http://www.google.com/patents/US8586006. 
  3. "A novel transmembrane glycoprotein cancer biomarker present in the X chromosome". Cancer Genomics & Proteomics 11 (2): 81–92. 2014. PMID 24709545. 
  4. 4.0 4.1 "NCBI CXorf66 protein". Conserved Domain Database. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/protein/NP_001013421.1. 
  5. "OMIM CXorf66 protein". Conserved Domain Database. National Center for Biotechnology Information. http://www.omim.org/geneMap/X?start=652&limit=10. 
  6. 6.0 6.1 "NCBI AceView CXorf66 protein". Conserved Domain Database. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=human&term=cxorf66&submit=Go. 
  7. "Softberry". Softberry.com. http://www.softberry.com. 
  8. 8.0 8.1 8.2 Department of Bioengineering. "SDSC Biology Workbench". University of California, San Diego. http://workbench.sdsc.edu/. 
  9. 9.0 9.1 "UniProtKB/Swiss-Prot Q5JRM2". UniProt consortium. https://www.uniprot.org/uniprot/Q5JRM2. 
  10. "SignalP 4.0: discriminating signal peptides from transmembrane regions". Nature Methods. http://www.cbs.dtu.dk/services/SignalP/. 
  11. Hirokawa T.; Boon-Chieng S.; Mitaku S. (1998). "SOSUI: classification and secondary structure prediction system for membrane proteins.". Bioinformatics (Journal of Bioinformatics) 14 (4): 378–379. doi:10.1093/bioinformatics/14.4.378. PMID 9632836. http://harrier.nagahama-i-bio.ac.jp/sosui/sosui_submit.html. Retrieved 2015-03-11. 
  12. Brendel, V., Bucher, P., Nourbakhsh, I.R., Blaisdell, B.E. & Karlin, S. (1992). "Methods and algorithms for statistical analysis of protein sequences". SAPS (Statistical Analysis of PS) (Proc. Natl. Acad. Sci. U.S.A.) 89 (6): 2002–2006. doi:10.1073/pnas.89.6.2002. PMID 1549558. PMC 48584. Bibcode1992PNAS...89.2002B. http://workbench.sdsc.edu/. Retrieved 2015-03-11. 
  13. "dbSNP". Conserved Domain Database. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/SNP/. 
  14. "CXorf66 Protein Interactions". STRING - Known and Predicted Protein-Protein Interactions. String-db.org. http://string-db.org/. 
  15. 15.0 15.1 Genomatix Software. "Genomatix ElDorado". http://www.genomatix.de/. 
  16. Jim Kent. "BLAT". UCSC Genome Bioinformatics. http://genome.ucsc.edu/FAQ/FAQblat.html. 
  17. "miRBase: the microRNA database". University of Manchester. http://www.mirbase.org/. 
  18. Blom, N.; Gammeltoft, S.; Brunak, S. (1999). "Sequence- and structure-based prediction of eukaryotic protein phosphorylation sites". Journal of Molecular Biology 294 (5): 1351–1362. doi:10.1006/jmbi.1999.3310. PMID 10600390. http://www.cbs.dtu.dk/services/NetPhos/. Retrieved 2015-03-11. 
  19. R Gupta.. "Prediction of glycosylation sites in proteomes: from post-translational modifications to protein function". Cbs.dtu.dk. http://www.cbs.dtu.dk/services/YinOYang/. 
  20. "GPS-SUMO: a tool for the prediction of sumoylation sites and SUMO-interaction motifs". Nucleic Acids Research. http://sumosp.biocuckoo.org/. 
  21. 22.0 22.1 Paul Horton. "PSORT II". Psort.hgc.jp. http://psort.hgc.jp/. 
  22. "BLAST: Basic Local Alignment Search Tool". Conserved Domain Database. National Center for Biotechnology Information. http://blast.ncbi.nlm.nih.gov/Blast.cgi. 
  23. "The Timescale of Life". TimeTree. http://www.timetree.org/. 
  24. "Unigene". Conserved Domain Database. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/unigene. 
  25. "Towards a knowledge-based Human Protein Atlas". The Human Protein Atlas (Nat Biotechnology) 28 (12): 1248–1250. 2010. doi:10.1038/nbt1210-1248. PMID 21139605. http://www.proteinatlas.org/ENSG00000203933/tissue. Retrieved 2015-03-11. 
  26. "CXorf66 -Endometriosis". NCBI GEO Profile. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS3092:490013. 
  27. Wang, M.. "PaxDb CXorf66". PaxDb: Protein Abundance Across Organisms. Mol Cell Proteomics. http://pax-db.org/#!protein/990172. 
  28. "CXorf66 - Dilated cardiomyopathy: septum". NCBI GEO Profile. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/geoprofiles/27908033. 
  29. "Occupational benzene exposure: peripheral blood mononuclear cells (HumanRef-8)". NCBI GEO Profile. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/geo/tools/profileGraph.cgi?ID=GDS3561:GI_37546417-S. 

External links