Biology:Genome mining

From HandWiki
Genome mining is associated with bioinformatics investigations.

Genome mining describes the exploitation of genomic information for the discovery of biosynthetic pathways of natural products and their possible interactions.[1] It depends on computational technology and bioinformatics tools. The mining process relies on a huge amount of data (represented by DNA sequences and annotations) accessible in genomic databases. By applying data mining algorithms, the data can be used to generate new knowledge in several areas of medicinal chemistry,[2][3] such as discovering novel natural products.[4]

History

In the mid- to late 1980s, researchers have increasingly focused on genetic studies with the advancing sequencing technologies.[5] The GenBank database was established in 1982 for the collection, management, storage, and distribution of DNA sequence data due to the increasing availability of DNA sequences. With the increasing number of genetic data, biotechnological companies have been able to use human DNA sequence to develop protein and antibody drugs through genome mining since 1992.[6] In the late 1990s, many companies, such as Amgen, Immunec, Genentech were able to develop drugs that progressed to the clinical stage by adopting genome mining.[7] Since the Human Genome Project was completed in the early 2000, researchers have been sequencing the genomes of many microorganisms.[8] Subsequently, many of these genomes have been carefully studied to identify new genes and biosynthetic pathways.[9]

Algorithms

As large quantities of genomic sequence data began to accumulate in public databases, genetic algorithms became important to decipher the enormous collection of genomic data. They are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection.[10] The followings are commonly used genetic algorithms:

  • AntiSMASH (Antibiotics and Secondary Metabolite Analysis Shell)[11] addresses secondary metabolite genome pipelines.[12]
  • PRISM (Prediction Informatics for Secondary Metabolites)[13] is a combinatorial approach to chemical structure prediction for genetically encoded nonribosomal peptides and type I and II polyketides.[14]
  • SIM (Statistically based sequence similarity) method, such as FASTA or PSI-BLAST, infer orthologous homology.[15]
  • BLAST (Basic local alignment search tool) is an approach for rapid sequence comparison.[16]

Applications

Genome mining applies on the discovery of natural product by facilitating the characterization of novel molecules and biosynthetic pathways.[4][17]

Natural product discovery

The production of natural products is regulated by the biosynthetic gene clusters (BGCs) encoded in the microorganism.[18] By adopting genome mining, the BGCs that produce the target natural product can be predicted.[19] Some important enzymes responsible for the formation of natural products are polyketide synthases (PKS), non-ribosomal peptide synthases (NRPS), ribosomally and post-translationally modified peptides (RiPPs), and terpenoids, and many more.[20] Mining for enzymes, researchers can figure out the classes that BGCs encode and compare target gene clusters to known gene clusters.[21] To verify the relation between the BGCs and natural products, the target BGCs can be expressed by suitable host through the use of molecular cloning.[22]

Databases and tools

Genetic data has been accumulated in databases. Researchers are able to utilize algorithms to decipher the data accessible from databases for the discovery of new processes, targets, and products.[10] The following are databases and tools:

  • GenBank database provides genomic datasets for analysis.[23]
  • UCSC Genome Browser
  • AntiSMASH-DB[11][24] allows comparing the sequences of newly sequenced BGCs against those of previously predicted and experimentally characterized ones.[25]
  • BIG-FAM [26] is a biosynthetic gene cluster family database.[27]
  • DoBISCUIT[28] is a database of secondary metabolite biosynthetic gene clusters.[29]
  • MIBiG (Minimum Information about a Biosynthetic Gene cluster specification)[30] provides a standard for annotations and metadata on biosynthetic gene clusters and their molecular products.[31]
  • Interactive tree of life (iTOL)[32] is a web-based tool for the display, manipulation and annotation of phylogenetic trees.[33]

References

  1. "Genome Mining as New Challenge in Natural Products Discovery". Marine Drugs 18 (4): 199. April 2020. doi:10.3390/md18040199. PMID 32283638. 
  2. "A deep learning genome-mining strategy for biosynthetic gene cluster prediction". Nucleic Acids Research 47 (18): e110. October 2019. doi:10.1093/nar/gkz654. PMID 31400112. 
  3. "Mini review: Genome mining approaches for the identification of secondary metabolite biosynthetic gene clusters in Streptomyces". Computational and Structural Biotechnology Journal 18: 1548–1556. 2020-01-01. doi:10.1016/j.csbj.2020.06.024. PMID 32637051. 
  4. 4.0 4.1 "Genome mining for novel natural product discovery". Journal of Medicinal Chemistry 51 (9): 2618–2628. May 2008. doi:10.1021/jm700948z. PMID 18393407. 
  5. "A novel method for nucleic acid sequence determination". Journal of Theoretical Biology 135 (3): 303–307. December 1988. doi:10.1016/S0022-5193(88)80246-7. PMID 3256722. Bibcode1988JThBi.135..303B. 
  6. "Patents in genomics and human genetics". Annual Review of Genomics and Human Genetics 11 (1): 383–425. 2010-09-01. doi:10.1146/annurev-genom-082509-141811. PMID 20590431. 
  7. "The evolution of genome mining in microbes - a review". Natural Product Reports 33 (8): 988–1005. August 2016. doi:10.1039/C6NP00025H. PMID 27272205. 
  8. "Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites". Proceedings of the National Academy of Sciences of the United States of America 98 (21): 12215–12220. October 2001. doi:10.1073/pnas.211433198. PMID 11572948. Bibcode2001PNAS...9812215O. 
  9. "Identification of Thiotetronic Acid Antibiotic Biosynthetic Pathways by Target-directed Genome Mining". ACS Chemical Biology 10 (12): 2841–2849. December 2015. doi:10.1021/acschembio.5b00658. PMID 26458099. 
  10. 10.0 10.1 "Data structures and compression algorithms for genomic sequence data". Bioinformatics 25 (14): 1731–1738. July 2009. doi:10.1093/bioinformatics/btp319. PMID 19447783. 
  11. 11.0 11.1 "AntiSMASH-DB". https://antismash-db.secondarymetabolites.org/. 
  12. "antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences". Nucleic Acids Research 39 (Web Server issue): W339–W346. July 2011. doi:10.1093/nar/gkr466. PMID 21672958. 
  13. "PRISM". Adapsyn Bioscience. http://prism.adapsyn.com. 
  14. "Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences". Nature Communications 11 (1): 6058. November 2020. doi:10.1038/s41467-020-19986-1. PMID 33247171. Bibcode2020NatCo..11.6058S. 
  15. "Confirmation of data mining based predictions of protein function". Bioinformatics 20 (7): 1110–1118. May 2004. doi:10.1093/bioinformatics/bth047. PMID 14764546. 
  16. "Basic local alignment search tool". Journal of Molecular Biology 215 (3): 403–410. October 1990. doi:10.1016/S0022-2836(05)80360-2. PMID 2231712. 
  17. "Mining genomes to illuminate the specialized chemistry of life". Nature Reviews. Genetics 22 (9): 553–571. September 2021. doi:10.1038/s41576-021-00363-7. PMID 34083778. 
  18. "Discovery of microbial natural products by activation of silent biosynthetic gene clusters". Nature Reviews. Microbiology 13 (8): 509–523. August 2015. doi:10.1038/nrmicro3496. PMID 26119570. 
  19. "Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria". Scientific Reports 10 (1): 2003. February 2020. doi:10.1038/s41598-020-58904-9. PMID 32029878. Bibcode2020NatSR..10.2003B. 
  20. "Natural products of filamentous fungi: enzymes, genes, and their regulation". Natural Product Reports 24 (2): 393–416. April 2007. doi:10.1039/B603084J. PMID 17390002. 
  21. "Genome mining for natural product biosynthetic gene clusters in the Subsection V cyanobacteria". BMC Genomics 16 (1): 669. September 2015. doi:10.1186/s12864-015-1855-z. PMID 26335778. 
  22. "Heterologous expression of natural product biosynthetic gene clusters in Streptomyces coelicolor: from genome mining to manipulation of biosynthetic pathways". Journal of Industrial Microbiology & Biotechnology 41 (2): 425–431. February 2014. doi:10.1007/s10295-013-1348-5. PMID 24096958. 
  23. "GenBank". Nucleic Acids Research 49 (D1): D92–D96. January 2021. doi:10.1093/nar/gkaa1023. PMID 33196830. 
  24. "IMG-ABC". https://img.jgi.doe.gov/cgi-bin/abc/main.cgi. 
  25. "IMG-ABC v.5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase". Nucleic Acids Research 48 (D1): D422–D430. January 2020. doi:10.1093/nar/gkz932. PMID 31665416. 
  26. "BIG-FAM". https://bigfam.bioinformatics.nl/. 
  27. "BiG-FAM: the biosynthetic gene cluster families database". Nucleic Acids Research 49 (D1): D490–D497. January 2021. doi:10.1093/nar/gkaa812. PMID 33010170. 
  28. "DoBISCUIT". https://www.nite.go.jp/en/nbrc/genome/dobiscuit.html. 
  29. "DoBISCUIT: a database of secondary metabolite biosynthetic gene clusters". Nucleic Acids Research 41 (Database issue): D408–D414. January 2013. doi:10.1093/nar/gks1177. PMID 23185043. 
  30. "MIBiG". https://mibig.secondarymetabolites.org/. 
  31. "MIBiG 2.0: a repository for biosynthetic gene clusters of known function". Nucleic Acids Research 48 (D1): D454–D458. January 2020. doi:10.1093/nar/gkz882. PMID 31612915. 
  32. "iTOL". https://itol.embl.de/. 
  33. "Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees". Nucleic Acids Research 44 (W1): W242–W245. July 2016. doi:10.1093/nar/gkw290. PMID 27095192.