Biology:Small integral membrane protein 14

From HandWiki
SMIM14 encoded in chromosome 4, band 4p14.[1]
3-D rendering of SMIM14 protein via Phyre2

Small integral membrane protein 14, also known as SMIM14 or C4orf34, is a protein encoded on chromosome 4 of the human genome by the SMIM14 gene.[2] SMIM14 has at least 298 orthologs mainly found in jawed vertebrates and no paralogs.[3] SMIM14 is classified as a type I transmembrane protein. While this protein is not well understood by the scientific community, the transmembrane domain of SMIM14 may be involved in ER retention.[4]

Gene

The SMIM14 gene is located on the minus strand at cytogenetic band 4p14 and is 92,567 base pairs in length.[5] The gene has five exons, four of which constitute the open-reading frame for SMIM14.[6]

The Kozak sequence, which functions as the protein translation initiation site in most eukaryotic mRNA transcripts, is considered a strong motif.[7] There is no signal peptide in SMIM14, but the encoded transmembrane domain acts as the signal sequence. It is predicted that one disulfide bridge is encoded in SMIM14, which stabilizes the tertiary (and sometimes quaternary) structures of proteins. There are at least ten polyadenylation sequences in the 3’ UTR of the SMIM14 gene, indicating transcription termination.

SMIM14 is expressed at four-times the level of an average gene.[8]

Gene regulation

Promoter

SMIM14 has seven predicted promoter regions. The promoter with the greatest number of transcripts and CAGE tags is approximately 1,420 base pairs in length. It is found on the minus strand and has a start position at residue 39,638,806 and ends at residue 39,640,225. The identified promoter has five coding transcripts and a maximum of 105,458 CAGE tags from one of the transcripts.[9]

Promoter ID Start Position End Position Length (bp) Coding Transcripts
GXP_150112 39,549,547 39,550,812 1,266 0
GXP_3198013 39,583,919 39,584,958 1,040 0
GXP_9520406 39,605,105 39,606,144 1,040 N/A
GXP_9520407 39,626,490 39,627,529 1,040 N/A
GXP_6750876 39,627,082 39,628,121 1,040 1
GXP_3198015 39,638,191 39,639,230 1,040 0
GXP_6750877 39,638,806 39,640,225 1,420 5

For the SMIM14 gene, the associated CpG sites are found in CpG island 76; additional transcription factors can bind to this promoter to drive SMIM14 gene expression.[10]

Literature-curated Transcription Factors

(via ORegAnno)

SMARCA4
STAT1
RBL2
TRIM28
EGR1
TFAP2C

RNA and expression

SMIM14 has three mRNA transcript variants. Transcript variant 1 is the longest variant, with 6,397 base pairs.[2]

Transcript Length (bp) Accession Number
Transcript variant 1 6,397 NM_001317896.2
Transcript variant 2 6,252 NM_174921
Transcript variant 3 6,263 NM_001317897

SMIM14 has high expression in the liver, adrenal gland, colon, and prostate. It is under-expressed in peripheral blood lymphocytes, skeletal muscles, and the heart.[11]

Protein

Visualization of SMIM14 protein with helical transmembrane domain (TMD) via the Protter web server.[12]

From SMIM14, transcript variant 1, a protein of 99 amino acids is synthesized.[13]

Primary structure

The predicted molecular weight (Mw) of the SMIM14 protein is 10710.34 Da. The SMIM14 protein carries no electrical charge at a pH value of 5.10 (i.e. isoelectric point, pI).[14] The abundance of every amino acid is within the normal range for humans.[14]

Transmembrane domain and motifs

The Kozak sequence is considered a strong motif.[7]

SMIM14 has one transmembrane domain, so it is classified as a single-pass membrane protein.[15] The transmembrane domain extends from residues 51–70.[16] It is predicted that within the domain, there is a dileucine motif, which plays a role in the sorting of transmembrane proteins to endosomes and lysosomes.[17] The N-terminus is positioned in the extracellular space, while the C-terminus is located inside the cell, further classifying SMIM14 as a type I transmembrane protein.

Secondary structure

It is predicted that there is an ɑ-helix within the transmembrane domain.[18] It is also predicted that SMIM14 is randomly coiled near the C-terminus.[18][19] A random coil is regarded as the protein's lack of a secondary structure, so it assumes a relaxed, non-interacting nor stabilizing conformation. It is also predicted that extended strands (E-strands) are throughout the protein.[18][19] E-strands are a common secondary structure, as well, and are often characterized by their involvement in hydrogen bonding with polar side chains.

Within the N-terminus, SMIM14 is predicted to have three palmitoylation sites,[20] which facilitates the clustering of proteins, and one disulfide bridge, stabilizing the structure of the protein. There is also a predicted glycosaminoglycan site spanning residues 45–48, proximal to the transmembrane domain.[21] The C-terminus is predicted to have two unidentified phosphorylation sites and one PKA-phosphorylation site.[22]

Subcellular location

SMIM14, a transmembrane protein, is usually expressed in the ER membrane.[4] While there is no conventional ER retention signal within SMIM14 coding sequences, it has been suggested that the transmembrane domain mediates ER retention.

Homology

SMIM14 has no known paralogs and at least 298 orthologs.

Paralogs

Through BLAST, it has been established that there are no paralogs of the SMIM14 gene in Homo sapiens.[23]

Orthologs

SMIM14 is conserved in most vertebrates, excluding hagfish, lampreys, lobe-finned fish, and lungfish.[23] For invertebrates, they are conserved in flatworms, roundworms, mollusks, and arthropods. It is also relatively conserved in distant relatives, such as sea anemones and corals.

Species Common Name Taxons DoD (mya) % Identity % Similarity Corrected % Divergence (m) Accession Number
Mastomys coucha Southern multimammate mouse rodentia 90 87.9 98.0 12.9 XP_031198284.1
Phyllostomus discolor pale spear-nosed bat mammalia 96 93.4 99.0 6.70 XP_028361411.1
Manacus vitellinus golden-collared manakin aves 312 85.1 91.1 16.1 XP_017923893.1
Python bivittatus Burmese python reptilia 312 80.2 89.1 22.1 XP_007426519
Nanorana parkeri high Himalaya frog amphibia 352 69.2 79.8 36.8 XP_018420132.1
Danio rerio zebrafish actinopterygii 435 68.0 82.5 38.6 NP_991165.1
Rhincodon typus whale shark chondrichthyes 473 71.8 84.5 33.1 XP_020383770.1
Ciona intestinalis sea vase ascidiacea 676 42.7 55.3 85.1 XP_026690156.1
Strongylocentrotus

purpuratus

Pacific purple sea urchin echinodermata 684 50.5 68.0 68.3 XP_787363.2
Lingula anatina lamp shell brachiopoda 797 59.0 74.3 52.8 XP_013382479.1
Limulus polyphemus Atlantic horseshoe crab arthropoda 797 49.5 65.0 70.3 XP_013782563.1
Agrilus planipennis emerald ash borer insecta 797 39.8 57.3 92.1 XP_018319678.1
Octopus vulgaris octopus mollusca 797 51.0 64.4 67.3 XP_029637526.1
Strongyloides ratti threadworm nematoda 797 33.3 48.1 110 XP_024504825.1
Exaiptasia pallida sea anemone anthozoa 824 58.2 65.5 54.1 XP_020902189.1
Schistosoma haematobium urinary blood fluke platyhelminthes 824 37.4 53.3 98.3 XP_012793134.1

The sequence of the SMIM14 gene is highly conserved in orthologs proximal to the N-terminus. In stark contrast, the C-terminus is more varied across orthologs. Sequence analysis of the SMIM14 gene in humans suggests that the C-terminus encodes a disproportionate amount of proline residues (9 out of 29; 31%) with several proline-rich sequences (PXXP).[4] Proline-rich domains are usually associated with protein-protein interactions; thus, the C-terminus has a high probability of interacting with proteins.

Protein interactions

SMIM14 has been predicted to interact with the FATE1 protein, which is involved in the Ca2+ transfer from the ER to mitochondria, a regulatory mechanism for apoptosis.[24][25] It has also been predicted that SMIM14 interacts with LSM4, a glycine-rich protein that plays a role in pre-mRNA splicing.[26][27]

References

  1. Hunt, Sarah E; McLaren, William; Gil, Laurent; Thormann, Anja; Schuilenburg, Helen; Sheppard, Dan; Parton, Andrew; Armean, Irina M et al. (1 January 2018). "Ensembl variation resources". Database 2018. doi:10.1093/database/bay119. PMID 30576484. 
  2. 2.0 2.1 (in en-US) Homo sapiens small integral membrane protein 14 (SMIM14), transcript variant 1, mRNA. 2019-07-07. http://www.ncbi.nlm.nih.gov/nuccore/NM_001317896.2. 
  3. "SMIM14 orthologs" (in en). https://www.ncbi.nlm.nih.gov/gene/201895/ortholog/. 
  4. 4.0 4.1 4.2 Jun, Mi-Hee; Jun, Young-Wu; Kim, Kun-Hyung; Lee, Jin-A; Jang, Deok-Jin (31 October 2014). "Characterization of the cellular localization of C4orf34 as a novel endoplasmic reticulum resident protein". BMB Reports 47 (10): 563–568. doi:10.5483/bmbrep.2014.47.10.252. PMID 24499674. 
  5. Chalifa-Caspi, V.; Shmueli, O; Benjamin-Rodrig, H; Rosen, N; Shmoish, M; Yanai, I; Ophir, R; Kats, P et al. (1 January 2003). "GeneAnnot: Interfacing GeneCards with high-throughput gene expression compendia". Briefings in Bioinformatics 4 (4): 349–360. doi:10.1093/bib/4.4.349. PMID 14725348. 
  6. "SMIM14 Gene - GeneCards | SIM14 Protein | SIM14 Antibody". https://www.genecards.org/cgi-bin/carddisp.pl?gene=SMIM14#domains_families. 
  7. 7.0 7.1 Hernández, Greco; Osnaya, Vincent G.; Pérez-Martínez, Xochitl (1 December 2019). "Conservation and Variability of the AUG Initiation Codon Context in Eukaryotes". Trends in Biochemical Sciences 44 (12): 1009–1021. doi:10.1016/j.tibs.2019.07.001. PMID 31353284. 
  8. "AceView: Gene:C4orf34, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView.". https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=human&q=C4orf34. 
  9. Cartharius, K.; Frech, K.; Grote, K.; Klocke, B.; Haltmeier, M.; Klingenhoff, A.; Frisch, M.; Bayerlein, M. et al. (1 July 2005). "MatInspector and beyond: promoter analysis based on transcription factor binding sites". Bioinformatics 21 (13): 2933–2942. doi:10.1093/bioinformatics/bti473. PMID 15860560. 
  10. Kent, W. J.; Sugnet, C. W.; Furey, T. S.; Roskin, K. M.; Pringle, T. H.; Zahler, A. M.; Haussler, a. D. (16 May 2002). "The Human Genome Browser at UCSC". Genome Research 12 (6): 996–1006. doi:10.1101/gr.229102. PMID 12045153. 
  11. "49002542 - GEO Profiles - NCBI". https://www.ncbi.nlm.nih.gov/geoprofiles/49002542. 
  12. "Protter - interactive protein feature visualization". http://wlab.ethz.ch/protter/start/. 
  13. "small integral membrane protein 14 [Homo sapiens - Protein - NCBI"]. https://www.ncbi.nlm.nih.gov/protein/961525486. 
  14. 14.0 14.1 Brendel, V.; Bucher, P.; Nourbakhsh, I. R.; Blaisdell, B. E.; Karlin, S. (15 March 1992). "Methods and algorithms for statistical analysis of protein sequences.". Proceedings of the National Academy of Sciences 89 (6): 2002–2006. doi:10.1073/pnas.89.6.2002. PMID 1549558. Bibcode1992PNAS...89.2002B. 
  15. Kall, L.; Krogh, A.; Sonnhammer, E. L.L. (8 May 2007). "Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server". Nucleic Acids Research 35 (Web Server): W429–W432. doi:10.1093/nar/gkm256. PMID 17483518. 
  16. Gouw, Marc; Michael, Sushama; Sámano-Sánchez, Hugo; Kumar, Manjeet; Zeke, András; Lang, Benjamin; Bely, Benoit; Chemes, Lucía B et al. (4 January 2018). "The eukaryotic linear motif resource – 2018 update". Nucleic Acids Research 46 (D1): D428–D434. doi:10.1093/nar/gkx1077. PMID 29136216. 
  17. Bonifacino, Juan S.; Traub, Linton M. (June 2003). "Signals for Sorting of Transmembrane Proteins to Endosomes and Lysosomes". Annual Review of Biochemistry 72 (1): 395–447. doi:10.1146/annurev.biochem.72.121801.161800. PMID 12651740. 
  18. 18.0 18.1 18.2 Combet, C; Blanchet, C; Geourjon, C; Deléage, G (March 2000). "NPS@: Network Protein Sequence Analysis". Trends in Biochemical Sciences 25 (3): 147–150. doi:10.1016/s0968-0004(99)01540-6. PMID 10694887. 
  19. 19.0 19.1 Ashok Kumar, T (1 April 2013). "CFSSP: Chou and Fasman Secondary Structure Prediction server". Wide Spectrum 1 (9): 15–19. doi:10.5281/ZENODO.50733. 
  20. Ren, J.; Wen, L.; Gao, X.; Jin, C.; Xue, Y.; Yao, X. (27 August 2008). "CSS-Palm 2.0: an updated software for palmitoylation sites prediction". Protein Engineering Design and Selection 21 (11): 639–644. doi:10.1093/protein/gzn039. PMID 18753194. 
  21. Gouw, Marc; Michael, Sushama; Sámano-Sánchez, Hugo; Kumar, Manjeet; Zeke, András; Lang, Benjamin; Bely, Benoit; Chemes, Lucía B et al. (4 January 2018). "The eukaryotic linear motif resource – 2018 update". Nucleic Acids Research 46 (D1): D428–D434. doi:10.1093/nar/gkx1077. PMID 29136216. 
  22. Blom, Nikolaj; Sicheritz-Pontén, Thomas; Gupta, Ramneek; Gammeltoft, Steen; Brunak, Søren (June 2004). "Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence". Proteomics 4 (6): 1633–1649. doi:10.1002/pmic.200300771. PMID 15174133. 
  23. 23.0 23.1 Altschul, Stephen F.; Gish, Warren; Miller, Webb; Myers, Eugene W.; Lipman, David J. (October 1990). "Basic local alignment search tool" (in en). Journal of Molecular Biology 215 (3): 403–410. doi:10.1016/S0022-2836(05)80360-2. PMID 2231712. 
  24. "FATE1 - Fetal and adult testis-expressed transcript protein - Homo sapiens (Human) - FATE1 gene & protein". https://www.uniprot.org/uniprot/Q969F0. 
  25. Doghman‐Bouguerra, Mabrouka; Granatiero, Veronica; Sbiera, Silviu; Sbiera, Iuliu; Lacas‐Gervais, Sandra; Brau, Frédéric; Fassnacht, Martin; Rizzuto, Rosario et al. (September 2016). "FATE 1 antagonizes calcium‐ and drug‐induced apoptosis by uncoupling ER and mitochondria". EMBO Reports 17 (9): 1264–1280. doi:10.15252/embr.201541504. PMID 27402544. 
  26. "LSM4 - U6 snRNA-associated Sm-like protein LSm4 - Homo sapiens (Human) - LSM4 gene & protein". https://www.uniprot.org/uniprot/Q9Y4Z0. 
  27. Bertram, Karl; Agafonov, Dmitry E.; Dybkov, Olexandr; Haselbach, David; Leelaram, Majety N.; Will, Cindy L.; Urlaub, Henning; Kastner, Berthold et al. (August 2017). "Cryo-EM Structure of a Pre-catalytic Human Spliceosome Primed for Activation". Cell 170 (4): 701–713.e11. doi:10.1016/j.cell.2017.07.011. PMID 28781166.