Biology:SOGA2

SOGA2, also known as Suppressor of glucose autophagy associated 2 or CCDC165, is a protein that in humans is encoded by the SOGA2 gene.^[1]^[2] SOGA2 has two human paralogs, SOGA1 and SOGA3.^[3]^[4] In humans, the gene coding sequence is 151,349 base pairs long, with an mRNA of 6092 base pairs, and a protein sequence of 1586 amino acids. The SOGA2 gene is conserved in gorilla, baboon, galago, rat, mouse, cat, and more. There is distant conservation seen in organisms such as zebra finches and anoles.^[5] SOGA2 is ubiquitously expressed in humans, with especially high expression in brain (especially the cerebellum and hippocampus), colon, pituitary gland, small intestine, spinal cord, testis and fetal brain.^[6]

Gene

Locus

The SOGA2 gene is located from 8717369 - 8832775 on the short arm of chromosome 18 (18p11.22).^[7]

Homology and Evolution

Paralogs

There are two main paralogs to SOGA2: human protein SOGA1 and human protein SOGA3.^[5] SOGA1 has been shown to be involved in suppression of glucose by autophagy.^[8] The rate at which orthologs diverge from SOGA2 human(measured by % identity) places the approximate duplication event of SOGA1 from SOGA2 at ~254.1 MYA and the duplication event of SOGA3 from SOGA2 ~329.1 MYA.

protein name	accession number	sequence length (aa)	sequence identity to human protein	notes
SOGA3	NP_001012279.1	947	58%	conserved in ~500 N-terminal aa
SOGA1 isoform 2	NP_954650.2	1016 aa	65%	conserved in first ~900 aa
SOGA1 isoform 1	NP_542194.2	1661	41%	conserved across the length of sequence except ~950-1150

Orthologs

Many orthologs have been identified in Eukaryotes.^[5]

common name	protein name	divergence from human lineage (MYA)	accession number	sequence length (aa)	sequence identity to human protein	protein domain differences
gorilla	protein SOGA2	8.8	XP_004059220.1	1586	99%
baboon	protein SOGA2	29	XP_003914218	1587	98%
galago	protein SOGA3	74	XP_003801047.1	1583	88%	DUF4201 not present
rat	CCDC165	92.3	XP_237548.6	2060	81%	DUF4201 not present
mouse	SOGA2	92.3	NP_001107570.1	1893	80%
house cat	protein SOGA2	94.2	XP_003995077.1	1700	84%	DUF4201 not present
cow	CCDC166	94.2	XP_581047.5	1525	74%	DUF4201 not present
African Elephant	CCDC167-like	98.7	XP_003406836.1	1544	73%
zebra Finch	protein SOGA2	296	XP_002193121.1	1598	69%	DUF4201 not present
Red JungleFowl	CCDC165	296	XP_423729.3	1600	70%	DUF4201 not present
Carolina anole	uncharacterized protein KIAA0802-like	296	XP_003225723.1	1839	67%	DUF4201 not present

A graph of sequence identity to human SOGA2 as a function of time of divergence of human SOGA2 orthologs.

Distant Homologs

common name	protein name	divergence from human lineage (MYA)	accession number	sequence length (aa)	sequence identity to human protein	protein domain differences
Tropical Clawed Frog	uncharacterized protein C20orf117-like	371.2	XP_002942331.1	1584	39%
purple sea urchin	uncharacterized protein LOC578090	742.9	XP_783370.2	1587	47%	DUF4201 not present
body louse	Centromeric protein E, putative	782.7	XP_002429877.1	2086	30%	no shared domains
southern house mosquito	conserved hypothetical protein	782.7	XP_001843754.1	1878	32%	no shared domains
porkworm	surface antigen repeat family protein	937.5	XP_003380263.1	2030	36%	no shared domains

Homologous Domains

SOGA2 is conserved farthest back in its N-terminal region, where it contains its three domains of unknown function.^[9]

A comparison of multiple sequence alignment of the N-terminal regions vs. C-terminal regions of distantly related SOGA2 orthologs. Here it is demonstrated that the N-terminal region is well conserved in organisms like the clawed frog (FROG_SOGA2) but the C-terminal region is not. Location 19 is an example of one of the 7 Leucine residue that is conserved across all orthologs.

Protein

Protein internal composition

SOGA2 is rich in glycine (ratio r of SOGA2 composition to average human protein is 1.723), glutamate (r = 1.647), and arginine (r = 1.357). It also has a lower than usual composition of tyrosine (r = 0.3406), isoleucine (r = 0.4430), phenylalanine (r = 0.5808), and valine (r = 0.6161).^[10]^[11]

Primary structure and isoforms

SOGA2 has 4 isoforms: Q9Y4B5-1, Q9Y4B5-2, Q9Y4B5-3, Q9Y4B5-4.^[12]

A graphic depicting the 4 different isoforms of SOGA2. Isoform 1 is canonical. Modification Key: * E → ELRGPPVLPEQSVSIEELQGQLVQAARLHQEETETFTNKIHK **Q → QNCCGYPRINIEEETLGFTRLPAGSTVKTLKSLGLQRLE *** NQTVLLTAPWGL → ELPCSALAPS...LHGLSQYNSL

Domains and motifs

SOGA2 contains Domain of Unknown Function 4201 (DUF4201) from aa 16-235. This domain is specific to the Coiled Coil Domain Containing family of proteins in eukaryotes.^[13] It also contains two copies of Domain of Unknown Function 3166 (DUF3166): one from aa 140-235 and one from aa 269-364.^[7]

Post-translational modifications

SOGA2 is expected to undergo a number of post-translational modifications. Modifications of human SOGA2 that are shared by orthologs include:

Sumoylation at amino acids 87, 152, 235, 392, and 1379.^[14]
Sulfination at tyrosines 14 and 1249.^[15]
Phosphorylation at a number of sites, highlighted in the following graphic:

Phosphorylation sites in SOGA2 predicted by netPhos.^[16] Highlighted sites are conserved as far back as African clawed frogs.

Secondary structure

The consensus of the prediction software PELE,^[17] GOR4,^[18] and SOSUICoil is that the secondary structure of SOGA2 is dominated by alpha helices with interspersed regions of random coil. GOR4 indicated that SOGA2 is dominated by alpha-helices; it predicted a mere 5.61% of residues in an extended strand (parallel or antiparallel Beta-sheet) conformation, as opposed to 47.79% alpha helix and 46.6% random coils.

Secondary structure of human SOGA2 predicted by the GOR4 tool. h corresponds to alpha helices, c corresponds to random coils, and e corresponds to extended strand

^[19]

Tertiary structure

SOGA2 shares sequence features in its highly conserved N-terminal region. This homology allows prediction of its tertiary structure on the basis of homology to published 3d structures via Phyre2^[20] and NCBI structure.^[21]

SOGA2's 3d structure predicted by Phyre2.^[20] Structure is based on the crystal structure of tropomyosin at 7 angstrom resolution, with 12% identity. 283 residues match, in the CCDC containing N-terminal region.	1I84 S, Heavy Meromyosin Subfragment Of Chicken Gizzard Smooth Muscle Myosin With Regulatory Light Chain In The Dephosphorylated State 3d structure. Highlighted region is conserved in SOGA2.^[21]

Gene expression

Promoter

The promoter for human SOGA2 is below.

The promoter of the human SOGA2 gene.

Gene expression data

The EST profile shows that, in humans, SOGA2 is highly expressed in many sites throughout the body, including bone, brain, ear, eye, and many others.^[22] There are a large number of transcripts in liver cancer samples. Human microarray data show that SOGA2 is moderately expressed, with especially high expression in brain (especially the cerebellum and hippocampus), colon, pituitary gland, small intestine, spinal cord, testis and fetal brain.^[6] Brain-tissue-specific microarray data show that SOGA2 has high expression throughout the posterior lobe of the cerebellar hemispheres and posterial lobe of the vermis in the mouse brain. There is low expression in most other areas of the brain.^[23]

Transcript variants

In humans, the SOGA2 gene produces 17 different transcripts, 8 of which form a protein product (one undergoes nonsense mediated decay). The main transcript in humans is transcript ID ENST00000359865, or SOGA2-001.^[24]

Function

Possible transcription factors

Possible transcription factors for human SOGA2 include:^[25]

Modulator recognition factor 2
cAMP-responsive element binding protein 1
alternative splicing variant of FOXP1
MDS1/EVI1-like gene 1
Ikaros 2, possible regulator of lymphocyte differentiation

Interactions

Protein complex co-immunoprecipitation (Co-IP) experiments revealed interacting proteins such as cell death regulators, ATP-binding cassette (ABC) transporters and protein kinase A binding proteins.^[26]

The 540 interacting proteins include ABCF1, ACTB, ACTL6A, BCLAF1, BCLAF1, CHEK1, and MAGEE2.^[26]

K-nearest neighbor analysis by wolf pSort indicates that in humans, SOGA2 is focused mainly in the nucleus, cytoplasm, and the cytonuclear space. There is a small chance that it is localizes to the golgi.^[27]

A number of protein interactants were also identified via the STRING database, including MARK2, MARK4, and PPP2R2B.

Clinical significance

SOGA2 has no currently known disease associations or mutations.

References

↑ Nagase T; Ishikawa K; Suyama M; Kikuno R; Miyajima N; Tanaka A; Kotani H; Nomura N et al. (April 1999). "Prediction of the coding sequences of unidentified human genes. XI. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro". DNA Res 5 (5): 277–86. doi:10.1093/dnares/5.5.277. PMID 9872452.
↑ "Entrez Gene: SOGA2". https://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=ShowDetailView&TermToSearch=23255.
↑ "SOGA1". NCBI. https://www.ncbi.nlm.nih.gov/protein/NP_542194.2.
↑ "SOGA3". NCBI. https://www.ncbi.nlm.nih.gov/protein/Q5TF21.1.
↑ ^5.0 ^5.1 ^5.2 "BLAST". NCBI BLAST. http://blast.ncbi.nlm.nih.gov.
↑ ^6.0 ^6.1 "GEO Profile 10132039". NCBI GEO. https://www.ncbi.nlm.nih.gov/geoprofiles/10132039.
↑ ^7.0 ^7.1 "NCBI". National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/protein/NP_056025.2.
↑ "Adiponectin lowers glucose production by increasing SOGA". Am. J. Pathol. 177 (4): 1936–45. October 2010. doi:10.2353/ajpath.2010.100363. PMID 20813965.
↑ "CLUSTALW". SDSC Biology Workbench. http://seqtool.sdsc.edu/CGI/BW.cgi#!. ^{[yes|permanent dead link|dead link}}]}
↑ "CLC Sequence Viewer". http://seqtool.sdsc.edu/CGI/BW.cgi#!. ^{[yes|permanent dead link|dead link}}]}
↑ Nagase T; Ishikawa K; Suyama M; Kikuno R; Miyajima N; Tanaka A; Kotani H; Nomura N et al. (Jan 2011). "Computational analysis of amino acid composition in human proteins". Bioinformatics Trends 6 (1&2): 39–43.
↑ "GeneCards". https://www.genecards.org/cgi-bin/carddisp.pl?gene=SOGA2&search=SOGA2.
↑ "NCBI Conserved Domains". National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=pfam13870&seltype=1.
↑ "SumoPlot". ABGENT. http://www.abgent.com/tools/sumoplot.
↑ "Sulfinator". expasy. http://web.expasy.org/sulfinator/.
↑ Blom N; Gammeltoft S; Brunak S (December 1999). "Sequence and structure-based prediction of eukaryotic protein phosphorylation sites". J. Mol. Biol. 294 (5): 1351–62. doi:10.1006/jmbi.1999.3310. PMID 10600390.
↑ "PELE". SDSC Biology Workbench. http://seqtool.sdsc.edu/CGI/BW.cgi#!. ^{[yes|permanent dead link|dead link}}]}
↑ "GOR4". npsa-pbil. http://npsa-pbil.ibcp.fr/cgi-bin/secpred. ^{[yes|permanent dead link|dead link}}]}
↑ "SOSUICoil". bp.nuap.nagoya-u.ac.jp. http://bp.nuap.nagoya-u.ac.jp/sosui/coil/submit.html.
↑ ^20.0 ^20.1 Kelley LA; Sternberg MJ (2009). "Protein structure prediction on the Web: a case study using the Phyre server". Nat Protoc 4 (3): 363–71. doi:10.1038/nprot.2009.2. PMID 19247286. http://spiral.imperial.ac.uk/bitstream/10044/1/18157/2/Nature%20Protocols_4_3_2009.pdf.
↑ ^21.0 ^21.1 "NCBI Structure". NCBI. https://www.ncbi.nlm.nih.gov/Structure/cblast/cblast.cgi.
↑ "Unigene". National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/unigene.
↑ "Allen Brain Atlas, SOGA2 microarray experiments". Allen Brain Atlas. http://human.brain-map.org/microarray/search/show?exact_match=true&search_term=SOGA2&search_type=gene&donors=9861,12876,14380,10021,15496,15697.
↑ "Ensemble: gene SOGA2". Ensembl. http://www.ensembl.org/Homo_sapiens/Transcript/Summary?db=core;g=ENSG00000168502;r=18:8717369-8832776;t=ENST00000359865.
↑ "El Dorado". Genomatix. http://www.genomatix.de/cgi-bin//eldorado. ^{[yes|permanent dead link|dead link}}]}
↑ ^26.0 ^26.1 "Molecular Interaction Database - MINT". http://mint.bio.uniroma2.it/mint/Welcome.do.
↑ "WoLF PSORT: protein localization predictor". Nucleic Acids Res. 35 (Web Server issue): W585–7. July 2007. doi:10.1093/nar/gkm259. PMID 17517783.

Anonymous

Search

Biology:SOGA2

Namespaces

More

Page actions

Contents

Gene

Locus

Homology and Evolution

Paralogs

Orthologs

Distant Homologs

Homologous Domains

Protein

Protein internal composition

Primary structure and isoforms

Domains and motifs

Post-translational modifications

Secondary structure

Tertiary structure

Gene expression

Promoter

Gene expression data

Transcript variants

Function

Possible transcription factors

Interactions

Clinical significance

References

Further reading

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Biology:SOGA2

Gene

Locus

Homology and Evolution

Paralogs

Orthologs

Distant Homologs

Homologous Domains

Protein

Protein internal composition

Primary structure and isoforms

Domains and motifs

Post-translational modifications

Secondary structure

Tertiary structure

Gene expression

Promoter

Gene expression data

Transcript variants

Function

Possible transcription factors

Interactions

Clinical significance

References

Further reading

Navigation

Wiki tools

Page tools

Other projects

Categories