Biology:SMIM23

From HandWiki
Short description: Protein-coding gene in the species Homo sapiens

SMIM23 or Small Integral Membrane Protein 23 is a protein which in humans is encoded by the SMIM23 or c5orf50 gene. The longer mRNA isoform is 519 nucleotides which translates to 172 amino acids of a protein.[1] In recent advancements, researchers have identified this gene, along with a few others, could potentially play a role in how facial morphology arises in humans.[2]

Gene

A table with accession number, chromosome location, strand location, size, and known aliases.

SMIM23 is a protein-encoding gene. Basic information about its aliases and chromosome location are given in the table. The schematic of the chromosome helps to visualize the location of the gene.

Human chromosome 5.png

mRNA

While the gene has two splice isoforms (isoforms X1 and X2), it has three exon/exon boundaries indicating four exons (nucleotide 1-105, 106-157, 158-225, and 226-519).[3]

Conceptual Translation of SMIM23. Labeled are the start and stop codon, exon splice sites, polyadenylation signal as well as singly conserved based highlighted in yellow, alpha helices with arrows, transmembrane domain in purple, domains of unknown function in blue, and a repeat domain is underlined.

Protein

Physical features

SMIM23 notably has a transmembrane domain.

The predicted isoelectric point for the unmodified/unprocessed protein in mice is 5.779 while only the transmembrane region in humans has an isoelectric point of 5.928[4]

The gene appears to be Leucine and Glutamic Acid rich though not at any usually high number. It is also weak in all other amino acids besides Alanine, Serine, and Glutamine.[5]

The region underlined in the conceptual translation was predicted to be an Involucrin repeat.[6]

Post-Translational modifications

The transmembrane region is 1674.2 daltons while the whole protein is 200008.51 Da. This is very similar to what was found with UniProt where predicted molecular weight was 20.025 kDa.[7] Antibody kits were investigated to see banding pattern and weight changes that may have occurred post translation. C5orf50 Polyclonal Antibody from ThermoFisher Scientific has a Western Blot banding pattern at 40 kDa.[8] This predicts that there is a significant amount of post-translational modification by addition of large components.

There are many phosphorylation sites along its sequence including two protein kinase C phosphorylation sites, cAMP- and cGMP-dependent protein kinase phosphorylation site, and a tyrosine kinase phosphorylation site.[9] There is also a confident potential C-terminal GPI-Modification Site.[10]

Schematic of protein. Marking locations of notable features that were confirmed with some level of confidence. Here red stands for phosphorylation site and grey stands for C-terminal GPI-modification site. The transmembrane domain in relation to the rest of the protein is shown.

Secondary structure

There are two stretches of alpha helices from amino acid 33 to 49 and 89 to 136 based on evidence from various programs that predict secondary structure. The most informative of all the programs from the ones investigated is PELE on Biology Workbench.[5]

Predicted structure of SMIM23 by I-Tasser program.

A 3D protein structure was predicted to look like a series of helices,[11] similar to what was predicted by other programs.

Subcellular localization

This human integral membrane protein is predicted to be found in the endoplasmic reticulum.[12] The same kind of investigation of protein localization in other types of species returned conflicting results. Many programs predicted the protein to be present in the cytosol.[12] This suggests the possibility of incorrect naming, i.e. the protein may not be integral membrane due to other predicted locations. This type of conclusion will require further information.

Expression

Not enough consensus exists as to where in the body SMIM23 is expressed. Databases indicate mainly in the testes,[13] but this may be due to the lack of data.

Regulation of Expression

The promoter region of SMIM23 is approximately 1192 nucleotides long with various predicted transcription factors.[14]

Regulation in the secondary structure is a predicted stem-loop in the 5' UTR region with a few areas of conservation across species.[15]

Function and clinical significance

Novel research has suggested that how face shape arises in individuals may be influenced by a set of genes. This set includes SMIM23.[2] Though in the paper the gene is referred to by an alias (C5orf50), it is clear that the scientists have gathered a list of five genes that likely determine facial shape. This is specifically people of European descent. These findings are supported by replicating phenotypes of each specific gene and statistical analysis. Just like findings elsewhere, the article mentions SMIM23 that likely codes for an unknown transmembrane protein. There have also been studies where a set of genes including SMIM23 may influence human height.[16] Furthermore, a great deal of research is being done on chromosome 5 in general to understand roles of certain genes on it including SMIM23.[17] This could one day provide insight into this gene’s specific roles on the chromosome itself.

Interacting proteins

The following proteins are predicted to interact with SMIM23.

Cilia And Flagella Associated Protein 43 also known as CFAP43 or WDR96 is the most confident of the predicted functional partners and is a tryptophan-aspartic acid repeat domain.

SFR1 is SWI5-dependent recombination repair 1 which is a component of the SWI5-SFR1 complex, a complex required for double-strand break repair via homologous recombination.

COL17A1 is collagen. Specifically type XVII, alpha 1. This may play a role in overall protein structure.

PRDM16 binds to DNA and acts as a transcriptional regulator. It functions in the differentiation between white and brown adipose tissue. It can also be a repressor of transforming growth factor-beta signaling.[18]

Homology and evolution

There are no known paralogs.

There are around 100+ known orthologs which range from primates to small ground animals. From these investigations and that of sequence similarity,[19] an ortholog space can be discussed. The closest relatives to humans with the SMIM23 gene were in primates so two types of monkeys were picked which diverged around 29.4 million years ago and had sequence similarities in the high 70s. Slightly more distant relatives with the gene come from a wide variety of animals from horses, to sea mammals, to bats, and more which all have similarities between 62-69%. Lastly, some distantly related orthologs were included like the Tasmanian devil and various scavenger animals which have similarities between 40-61%.

It is interesting to see how some portions are still highly conserved (see conceptual translation above). The most interesting motif is tryptophan 124, leucine 125, and aspartic acid 126. Lastly, in BLAST a protein family of unknown function was returned. There are two small conserved sequences part of the DUF4635 motif (LEQ and DLE). So though not completely conserved in the alignments done with SMIM23, these were labeled in the conceptual translation.[20]

Orthologs

A phylogenetic tree of the SMIM23 gene in various animals as seen in the table included. Abbreviations refer to the different common names i.e. Hu SMIM23 refers to the human gene.

The protein was not found in bacteria, archaea, protists, plants, fungi, invertebrate, reptiles, and birds. All the found orthologs were under mammals.[3] An unrooted phylogenetic tree[5] of SMIM23 was created with a few close, moderately related, and distant orthologs (listed in table). Here, larger the distance (length of line), longer the time to last common ancestor. Sequence identity refers to similar amino acids while similarity refers to amino acid match.

Genus and Species[3] Common Name[3] Date of Divergence (MYA)[21] Sequence Identity (%)[5] Sequence Similarity (%)[3]
Cercocebus atys Sooty mangabey 29.44 73.8 77.8
Macaca mulatta Rhesus monkey 29.44 73.3 78.3
Galeopterus variegatus Sunda flying lemur 76 56.5 67
Tupaia chinensis Chinese tree shrew 82 54.7 66
Castor canadensis American beaver 90 54.1 65
Microtus ochrogaster Prairie vole 90 54.7 64.2
Mustela putorius furo Ferret 96 59.9 62
Equus caballus Horse 96 57 68.2
Odobenus rosmarus Walrus 96 59.3 66.4
Acinonyx jubatus Cheetah 96 58.7 63
Ursus maritimus Polar bear 96 58.1 69.3
Camelus ferus Wild bactrian camel 96 55.2 62.2
Dasypus novemcinctus Nine-banded armadillo 105 31.2 40.2
Echinops telfairi Lesser hedgehog tenrec 105 50 61
Sarcophilus harrisii Tasmanian devil 159 34.7 47.7
Monodelphis domestica Gray short-tailed opossum 159 28.5 44.6

References

  1. Database, GeneCards Human Gene. "SMIM23 Gene - GeneCards | SIM23 Protein | SIM23 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=SMIM23&keywords=smim23. 
  2. 2.0 2.1 Liu, Fan; Lijn, Fedde van der; Schurmann, Claudia; Zhu, Gu; Chakravarty, M. Mallar; Hysi, Pirro G.; Wollstein, Andreas; Lao, Oscar et al. (2012-09-13). "A Genome-Wide Association Study Identifies Five Loci Influencing Facial Morphology in Europeans". PLOS Genetics 8 (9): e1002932. doi:10.1371/journal.pgen.1002932. ISSN 1553-7404. PMID 23028347. 
  3. 3.0 3.1 3.2 3.3 3.4 "SMIM23 small integral membrane protein 23 [Homo sapiens (human) - Gene - NCBI"]. https://www.ncbi.nlm.nih.gov/gene/644994. 
  4. Program by Dr. Luca Toldo, developed at http://www.embl-heidelberg.de. Changed by Bjoern Kindler to print also the lowest found net charge. Available at EMBL WWW Gateway to Isoelectric Point Service "EMBL WWW Gateway to Isoelectric Point Service". Archived from the original on 2008-10-26. https://web.archive.org/web/20081026062821/http://www.embl-heidelberg.de/cgi/pi-wrapper.pl. Retrieved 2014-05-10. 
  5. 5.0 5.1 5.2 5.3 Workbench, NCSA Biology. "SDSC Biology Workbench". http://workbench.sdsc.edu/. 
  6. EMBL-EBI. "RADAR - Rapid Automatic Detection and Alignment of Repeats in protein sequences < EMBL-EBI" (in en). http://www.ebi.ac.uk/Tools/pfa/radar/. 
  7. "SMIM23 - Small integral membrane protein 23 - Homo sapiens (Human) - SMIM23 gene & protein" (in en). https://www.uniprot.org/uniprot/A6NLE4. 
  8. "C5orf50 Antibody" (in en). https://www.thermofisher.com/antibody/product/C5orf50-Antibody-Polyclonal/PA5-46409. 
  9. Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2010; 38(Database issue):D161-6.
  10. Eisenhaber B., Bork P., Eisenhaber F. "Prediction of potential GPI-modification sites in proprotein sequences" JMB (1999) 292 (3), 741-758
  11. "I-TASSER server for protein structure and function prediction". http://zhanglab.ccmb.med.umich.edu/I-TASSER/. 
  12. 12.0 12.1 "PSORT II server - GenScript". http://www.genscript.com/tools/psort. 
  13. github.com/gxa/atlas/graphs/contributors, EMBL-EBI Expression Atlas development team. "Expression summary for SMIM23 - homo sapiens < Expression Atlas < EMBL-EBI". https://www.ebi.ac.uk/gxa/genes/ENSG00000185662?bs=%7B%22homo%20sapiens%22:%5B%22ORGANISM_PART%22%5D%7D&ds=%7B%22kingdom%22:%5B%22animals%22%5D%7D#baseline. 
  14. "Genomatix - NGS Data Analysis & Personalized Medicine". https://www.genomatix.de/. 
  15. "The Mfold Web Server | mfold.rit.albany.edu" (in en). http://unafold.rna.albany.edu/?q=mfold. 
  16. Lango Allen, Hana; Estrada, Karol; Lettre, Guillaume; Berndt, Sonja I.; Weedon, Michael N.; Rivadeneira, Fernando; Willer, Cristen J.; Jackson, Anne U. et al. (2010-10-14). "Hundreds of variants clustered in genomic loci and biological pathways affect human height". Nature 467 (7317): 832–838. doi:10.1038/nature09410. ISSN 1476-4687. PMID 20881960. Bibcode2010Natur.467..832L. 
  17. Schmutz, Jeremy; Martin, Joel; Terry, Astrid; Couronne, Olivier; Grimwood, Jane; Lowry, Steve; Gordon, Laurie A.; Scott, Duncan et al. (2004-09-16). "The DNA sequence and comparative analysis of human chromosome 5" (in en). Nature 431 (7006): 268–274. doi:10.1038/nature02919. ISSN 0028-0836. PMID 15372022. Bibcode2004Natur.431..268S. 
  18. "STRING: functional protein association networks". http://string-db.org/. 
  19. "The European Bioinformatics Institute - EMBOSS Needle - Pairwise Sequence Alignment". http://www.ebi.ac.uk/Tools/psa/emboss_needle/. 
  20. EMBL-EBI, InterPro. "Protein of unknown function DUF4635 (IPR027880) < InterPro < EMBL-EBI" (in en). http://www.ebi.ac.uk/interpro/entry/IPR027880. 
  21. "TimeTree :: The Timescale of Life". http://www.timetree.org. 

Suggested Reading

  • Liu F, van der Lijn F, Schurmann C, Zhu G, Chakravarty MM, Hysi PG, et al. (2012) A Genome-Wide Association Study Identifies Five Loci Influencing Facial Morphology in Europeans. PLoS Genet 8(9): e1002932. https://doi.org/10.1371/journal.pgen.1002932
  • Lowe JK, Maller JB, Pe'er I, Neale BM, Salit J, Kenny EE, et al. (2009) Genome-Wide Association Studies in an Isolated Founder Population from the Pacific Island of Kosrae. PLoS Genet 5(2): e1000365. https://doi.org/10.1371/journal.pgen.1000365
  • Greliche N, Germain M, Lambert J-C, et al. A genome-wide search for common SNP x SNP interactions on the risk of venous thrombosis. BMC Medical Genetics. 2013;14:36. doi:10.1186/1471-2350-14-36.
  • Schmutz J et al. (2004). The DNA sequence and comparative analysis of human chromosome 5. Nature, 431(7006), 268-74. https://dx.doi.org/10.1038/nature02919
  • Lango Allen H, Estrada K, Lettre G, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467(7317):832-838. doi:10.1038/nature09410.
  • Rose JE, Behm FM, Drgon T, Johnson C, Uhl GR. Personalized Smoking Cessation: Interactions between Nicotine Dose, Dependence and Quit-Success Genotype Score. Molecular Medicine. 2010;16(7-8):247-253. doi:10.2119/molmed.2009.00159.