Biology:List of biological databases

From HandWiki
Revision as of 21:38, 16 March 2024 by Jslovo (talk | contribs) (fix)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: None

Biological databases are stores of biological information.[1] The journal Nucleic Acids Research regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases and updates to previously described databases.[2] Omics Discovery Index can be used to browse and search several biological databases. Furthermore, the NIAID Data Ecosystem Discovery Portal developed by the National Institute of Allergy and Infectious Diseases (NIAID) enables searching across databases.

Meta databases

Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism.[metadatabase is a database model for metadata management, global query of independent database, and distributed data processing. The word metadatabase is an addition to the dictionary]. originally, metadata was only common term referring simply to data about data such a tags, keywords, and markup headers.

Model organism databases

Model organism databases provide in-depth biological data for intensively studied organisms.

Nucleic acid databases

DNA databases

The primary databases make up the International Nucleotide Sequence Database (INSD). The include:

DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe) are repositories for nucleotide sequence data from all organisms. All three accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.

Secondary databases are:[clarification needed]

  • 23andMe's database
  • HapMap
  • OMIM (Online Mendelian Inheritance in Man): inherited diseases
  • RefSeq
  • 1000 Genomes Project: launched in January 2008. The genomes of more than a thousand anonymous participants from a number of different ethnic groups were analyzed and made publicly available.
  • EggNOG Database: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. It provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation.[6][7]

Other databases

Gene expression databases

Generic gene expression databases

Microarray gene expression databases

Genome databases

These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.


Phenotype databases

RNA databases

Amino acid / protein databases

(See also: List of proteins in the human body)

Several publicly available data repositories and resources have been developed to support and manage protein related information, biological knowledge discovery and data-driven hypothesis generation.[15] The databases in the table below are selected from the databases listed in the Nucleic Acids Research (NAR) databases issues and database collection and the databases cross-referenced in the UniProtKB. Most of these databases are cross-referenced with UniProt / UniProtKB so that identifiers can be mapped to each other.[15]

Proteins in human:

There are about ~20,000 protein coding genes in the standard human genome. (Roughly ~1200 already have Wikipedia articles - the Gene Wiki - about them) if we are Including splice variants, there could be as many as 500,000 unique human proteins[16]

Different types of Protein databases

Page Template:COVID-19 pandemic data/styles2.css has no content.

[expand]
[collapse]
DB name DB website Provider Data sources Revenue/Sponsors sources Integrates Wiki article Desc. Size DB type Actively maintained
InterPro http://www.ebi.ac.uk/interpro/ ELIXIR infrastructure European Bioinformatics Institute EMBL, The Welcome trust, BBSRC CATH-Gene3D, CDD, HAMAP, MobiDB, PANTHER, Pfam, SMART, SUPERFAMILY, SFLD, TIGRFAMs, InterPro classifies proteins into families and predicts the presence of domains and sites Protein sequence databases Yes
NextProt https://www.nextprot.org/ CALIPHO (is a group at the SIB) Swiss Institute of Bioinformatics https://www.sib.swiss/about/funding-sources UniProt, Cellosaurus, Gnomad, IntAct, SRAA Atlas, Uniprot - GOA, BGEE, COSMIC, MassIVE, Peptide atlas neXtProt a human protein-centric knowledge resource Protein sequence databases Yes
Wiki-pi http://severus.dbmi.pitt.edu/wiki-pi/ Madhavi K. Ganapathiraju At present Wiki-Pi contains 48,419 unique interactions among 10,492 proteins. However it is not clear if this is unique proteins[13] Protein interactoin Database ??
Human Protein Reference Database Institute of Bioinformatics (IOB), Bangalore, India Human Protein Reference Database One source claims 15000 [17] proteins. But it is unclear how many of these are unique
Sanger Institute Pfam protein families database of alignments and HMMs Protein sequence databases
Human Proteinpedia Institute of Bioinformatics (IOB), Bangalore and Johns Hopkins University, Human Proteinpedia The human Proteinpedia is based on HPRD (Human protein reference database)which is a repository hosting over 30,000 human proteins. However it is unclear how many of these are unique proteins
Human Protein Atlas The Swedish Government Human Protein Atlas It contains roughly 10 million IHC images of a bit less than 25,000 antibodies. But once again it is unclear how many of these are unique
Manchester University PRINTS a compendium of protein fingerprints Protein sequence databases
PROSITE database of protein families and domains Protein sequence databases
Georgetown University Medical Center [GUMC] Protein Information Resource Protein sequence databases
SUPERFAMILY library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms Protein sequence databases
Swiss Institute of Bioinformatics Swiss-Prot protein knowledgebase Protein sequence databases
NCBI protein sequence and knowledgebase (National Center for Biotechnology Information) Protein sequence databases
Protein DataBank in Europe (PDBe),[18] ProteinDatabank in Japan (PDBj),[19] Research Collaboratory for Structural Bioinformatics (RCSB)[20] Protein Data Bank (PDB) Protein structure databases
Structural Classification of Proteins (SCOP) Protein structure databases
Protein Structure Classification database CATH : Protein structure databases
Sali Lab, UCSF ModBase database of comparative protein structure models Protein model databases
Similarity Matrix of Proteins SIMAP database of protein similarities computed using FASTA Protein model databases
Swiss-model server and repository for protein structure models Protein model databases
AAindex database of amino acid indices, amino acid mutation matrices, and pair-wise contact potentials Protein model databases
Samuel Lunenfeld Research Institute BioGRID general repository for interaction datasets Protein-protein and other molecular interactions
RNA-binding protein databas Protein-protein and other molecular interactions
Univ. of California Database of Interacting Proteins Protein-protein and other molecular interactions
(EMBL-EBI) IntAct:[21] open-source database for molecular interactions Protein-protein and other molecular interactions
String an open source molecular interaction database to study interactions between proteins Protein-protein and other molecular interactions
Human Protein Atlas aims at mapping all the human proteins in cells, tissues and organs Protein expression databases
ProteinModelPortal Protein Model Portal of the PSI-Nature Structural Biology Knowledgebase ?? ?? 3D structure protein databases
SMR Database of annotated 3D protein structure models University of Basel The Swiss government 3D structure protein databases
DisProt Database of Protein Disorder ELIXIR infrastructure Indiana University School of Medicine, Temple University, University of Padua funding from the European Union's Horizon 2020 Swiss Prot/Uni Prot, CATH, Pfam, Europe PMC, BITEM, ECO, Geneontology DisProt database of experimental evidences of disorder in proteins 3D structure protein databases, Protein sequence databases
MobiDB Database of intrinsically disordered and mobile proteins John Moult, Christine Orengo, Predrag Radivojac University of Padua Italian Government MobiDB database of intrinsic protein disorder annotation 3D structure protein databases, Protein sequence databases
ModBase Database of Comparative Protein Structure Models Ursula Pieper, Ben Webb, Narayanan Eswar, Andrej Sali Roberto Sanchez UCSF, Sali Lab 3D structure protein databases
PDBsum Pictorial database of 3D structures in the Protein Data Bank European Bioinformatics Institute 2013 Wellcome Trust 3D structure protein databases
CCDS The Consensus CDS protein set database NCBI ?? Sequence databases
DDBJ DNA Data Bank of Japan ?? ?? Sequence databases
ENA European Nucleotide Archive ?? ?? Sequence databases
GenBank GenBank nucleotide sequence database ?? ?? Sequence databases
Refseq NCBI Reference Sequence Database ?? ?? Sequence databases
UniGene Database of computationally identifies transcripts from the same locus ?? ?? Sequence databases
UniProtKB Universal Protein Resource (UniProt) ?? ?? Sequence databases
Swiss Prot/Uni Prot https://www.sib.swiss/swiss-prot and https://www.uniprot.org/ SIB Swiss Institute of Bioinformatics European Bioinformatics Institute (EMBL-EBI) Swiss-Prot has collected over 81 000 variants in roughly 13,000 human protein sequence records from peer-reviewed literature. It is unclear how many unique proteins types are present in the database.

Signal transduction pathway databases

Metabolic pathway and protein function databases

Taxonomic databases

Main page: Biology:List of biodiversity databases

Numerous databases collect information about species and other taxonomic categories. The Catalogue of Life is a special case as it is a meta-database of about 150 specialized "global species databases" (GSDs) that have collected the names and other information on (almost) all described and thus "known" species.

  • BacDive: bacterial metadatabase that provides strain-linked information about bacterial and archaeal biodiversity, including taxonomy information
  • Catalogue of Life: a meta-database of all species on earth
  • EzTaxon-e: database for the identification of prokaryotes based on 16S ribosomal RNA gene sequences
  • NCBI Taxonomy: a taxonomic database operated by NCBI and concentrating on all taxa for which DNA sequences are available (those sequences are stored by GenBank, another database operated by NCBI).

Image databases

Images play a critical role in biomedicine, ranging from images of anthropological specimens to zoology. However, there are relatively few databases dedicated to image collection, although some projects such as iNaturalist collect photos as a main part of their data. A special case of "images" are 3-dimensional images such as protein structures or 3D-reconstructions of anatomical structures. Image databases include, among others:[22]

  • Allen Brain Atlas
  • Digital Brain Bank[23]
  • Electron Microscopy Public Image Archive (EMPIAR)[24]
  • Image Data Resource[22]
  • Morphobank
  • Morphosource

Radiologic databases

Additional databases

Exosomal databases

  • ExoCarta
  • Extracellular RNA Atlas: a repository of small RNA-seq and qPCR-derived exRNA profiles from human and mouse biofluids

Mathematical model databases

  • Biomodels Database: published mathematical models describing biological processes

Radiologic databases

Databases on antimicrobial resistance rates and antibiotic consumption

Databases on antimicrobial resistance mechanisms

Wiki-style databases

Specialized databases

References

  1. "Databases, data tombs and dust in the wind". Bioinformatics 24 (19): 2127–8. October 2008. doi:10.1093/bioinformatics/btn464. PMID 18819940. 
  2. "Volume 46 Issue D1 | Nucleic Acids Research | Oxford Academic" (in en). https://academic.oup.com/nar/issue/46/D1. 
  3. "PomBase 2018: user-driven reimplementation of the fission yeast database provides rapid and intuitive access to diverse, interconnected information". Nucleic Acids Research 47 (D1): D821–D827. January 2019. doi:10.1093/nar/gky961. PMID 30321395. 
  4. "SubtiWiki in 2018: from genes and proteins to functional network annotation of the model organism Bacillus subtilis". Nucleic Acids Research 46 (D1): D743–D748. January 2018. doi:10.1093/nar/gkx908. PMID 29788229. 
  5. Margarita Garcia-Hernandez; Tanya Berardini; Guanghong Chen; Debbie Crist; Aisling Doyle; Eva Huala; Emma Knee; Mark Lambrecht et al. (November 2002). "TAIR: a resource for integrated Arabidopsis data". Functional & Integrative Genomics 2 (6): 239–253. doi:10.1007/s10142-002-0077-z. PMID 12444417. 
  6. "eggNOG v4.0: nested orthology inference across 3686 organisms". Nucleic Acids Research 42 (Database issue): D231-9. January 2014. doi:10.1093/nar/gkt1253. PMID 24297252. 
  7. "eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses". Nucleic Acids Research 47 (D1): D309–D314. January 2019. doi:10.1093/nar/gky1085. PMID 30418610. 
  8. ArrayExpress
  9. GEO
  10. "The Human Protein Atlas". https://www.proteinatlas.org/. 
  11. "Legume information system (LegumeInfo.org): a key component of a set of federated data resources for the legume family". Nucleic Acids Research 44 (D1): D1181-8. January 2016. doi:10.1093/nar/gkv1159. PMID 26546515. 
  12. "Saccharomyces Genome Database | SGD". https://www.yeastgenome.org/. 
  13. "SoyBase, the USDA-ARS soybean genetics and genomics database". Nucleic Acids Research 38 (Database issue): D843-6. January 2010. doi:10.1093/nar/gkp798. PMID 20008513. 
  14. "IRESbase". http://reprod.njmu.edu.cn/cgi-bin/iresbase/index.php. 
  15. 15.0 15.1 "Protein Bioinformatics Databases and Resources". Protein Bioinformatics. Methods in Molecular Biology. 1558. New York, NY: Springer New York. 2017. pp. 3–39. doi:10.1007/978-1-4939-6783-4_1. ISBN 978-1-4939-6781-0. 
  16. Karnkowska, Anna; Treitli, Sebastian C.; Brzoň, Ondřej; Novák, Lukáš; Vacek, Vojtěch; Soukal, Petr; Barlow, Lael D.; Herman, Emily K. et al. (2019). "The Oxymonad Genome Displays Canonical Eukaryotic Complexity in the Absence of a Mitochondrion". Molecular Biology and Evolution 36 (10): 2292–2312. doi:10.1093/molbev/msz147. PMID 31387118. 
  17. Keshava Prasad, T. S.; Goel, R.; Kandasamy, K.; Keerthikumar, S.; Kumar, S.; Mathivanan, S.; Telikicherla, D.; Raju, R. et al. (2008). "Human Protein Reference Database—2009 update". Nucleic Acids Research 37 (Database issue): D767–D772. doi:10.1093/nar/gkn892. PMID 18988627. 
  18. "PDBe: towards reusable data delivery infrastructure at protein data bank in Europe". Nucleic Acids Research 46 (D1): D486–D492. January 2018. doi:10.1093/nar/gkx1070. PMID 29126160. 
  19. "Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures". Nucleic Acids Research 45 (D1): D282–D288. January 2017. doi:10.1093/nar/gkw962. PMID 27789697. 
  20. "The RCSB protein data bank: integrative view of protein, gene and 3D structural information". Nucleic Acids Research 45 (D1): D271–D281. January 2017. doi:10.1093/nar/gkw1000. PMID 27794042. 
  21. "IntAct: an open source molecular interaction database". Nucleic Acids Research 32 (Database issue): D452-5. January 2004. doi:10.1093/nar/gkh052. PMID 14681455. 
  22. 22.0 22.1 "A call for public archives for biological image data". Nature Methods 15 (11): 849–854. November 2018. doi:10.1038/s41592-018-0195-8. PMID 30377375. 
  23. "The Digital Brain Bank, an open access platform for post-mortem imaging datasets". eLife 11: e73153. March 2022. doi:10.7554/eLife.73153. PMID 35297760. 
  24. "EMPIAR: a public archive for raw electron microscopy image data". Nature Methods 13 (5): 387–388. May 2016. doi:10.1038/nmeth.3806. PMID 27067018. 
  25. Crickmore, N.; Berry, C.; Panneerselvam, S.; Mishra, R.; Connor, T. R.; Bonning, B. C. (November 2021). "A structure-based nomenclature for Bacillus thuringiensis and other bacteria-derived pesticidal proteins". Journal of Invertebrate Pathology 186 (D1): 107438. doi:10.1016/j.jip.2020.107438. PMID 32652083. 
  26. Panneerselvam S; Mishra R; Berry C; Crickmore N; Bonning BC (2022). "BPPRC database: a web-based tool to access and analyse bacterial pesticidal proteins.". Database (Oxford) 186 (D1): 107438. doi:10.1093/database/baac022. PMID 35396594. 
  27. "HRT Atlas v1.0 database: redefining human and mouse housekeeping genes and candidate reference transcripts by mining massive RNA-seq datasets". Nucleic Acids Research 49 (D1): D947–D955. January 2021. doi:10.1093/nar/gkaa609. PMID 32663312. 
  28. (IHEC) data portal
  29. CEEHRC
  30. Blueprint
  31. EGA
  32. DEEP
  33. CREST
  34. "Sharing epigenomes globally" (in En). Nature Methods 15 (3): 151. 2018. doi:10.1038/nmeth.4630. ISSN 1548-7105. 
  35. "MetOSite: an integrated resource for the study of methionine residues sulfoxidation". Bioinformatics 35 (22): 4849–4850. November 2019. doi:10.1093/bioinformatics/btz462. PMID 31197322. 

External links