Biology:SNP annotation

From HandWiki
SNP annotation
ClassificationBioinformatics
SubclassificationSingle-nucleotide polymorphism
Type of tools usedFunctional annotation tools
Other subjects relatedGenome project, Genomics

Single nucleotide polymorphism annotation (SNP annotation) is the process of predicting the effect or function of an individual SNP using SNP annotation tools. In SNP annotation the biological information is extracted, collected and displayed in a clear form amenable to query. SNP functional annotation is typically performed based on the available information on nucleic acid and protein sequences.[1]

Introduction

Directed graph of relationships among SNP prediction webservers and their bioinformatics sources.[2]

Single nucleotide polymorphisms (SNPs) play an important role in genome wide association studies because they act as primary biomarkers. SNPs are currently the marker of choice due to their large numbers in virtually all populations of individuals. The location of these biomarkers can be tremendously important in terms of predicting functional significance, genetic mapping and population genetics.[3] Each SNP represents a nucleotide change between two individuals at a defined location. SNPs are the most common genetic variant found in all individual with one SNP every 100–300 bp in some species.[4] Since there is a massive number of SNPs on the genome, there is a clear need to prioritize SNPs according to their potential effect in order to expedite genotyping and analysis. [5]

Annotating large numbers of SNPs is a difficult and complex process, which need computational methods to handle such a large dataset. Many tools available have been developed for SNP annotation in different organisms: some of them are optimized for use with organisms densely sampled for SNPs (such as humans), but there are currently few tools available that are species non-specific or support non-model organism data. The majority of SNP annotation tools provide computationally predicted putative deleterious effects of SNPs. These tools examine whether a SNP resides in functional genomic regions such as exons, splice sites, or transcription regulatory sites, and predict the potential corresponding functional effects that the SNP may have using a variety of machine-learning approaches. But the tools and systems that prioritize functionally significant SNPs, suffer from few limitations: First, they examine the putative deleterious effects of SNPs with respect to a single biological function that provide only partial information about the functional significance of SNPs. Second, current systems classify SNPs into deleterious or neutral group.[6]

Many annotation algorithms focus on single nucleotide variants (SNVs), considered more rare than SNPs as defined by their minor allele frequency (MAF).[7][8] As a consequence, training data for the corresponding prediction methods may be different and hence one should be careful to select the appropriate tool for a specific purpose. For the purposes of this article, "SNP" will be used to mean both SNP and SNV, but readers should bear in mind the differences.

SNP annotation

Different type of annotations in genomics

For SNP annotation, many kinds of genetic and genomic information are used. Based on the different features used by each annotation tool, SNP annotation methods may be split roughly into the following categories:

Gene based annotation

Genomic information from surrounding genomic elements is among the most useful information for interpreting the biological function of an observed variant. Information from a known gene is used as a reference to indicate whether the observed variant resides in or near a gene and if it has the potential to disrupt the protein sequence and its function. Gene based annotation is based on the fact that non-synonymous mutations can alter the protein sequence and that splice site mutation may disrupt the transcript splicing pattern.[9]

Knowledge based annotation

Knowledge base annotation is done based on the information of gene attribute, protein function and its metabolism. In this type of annotation more emphasis is given to genetic variation that disrupts the protein function domain, protein-protein interaction and biological pathway. The non-coding region of genome contain many important regulatory elements including promoter, enhancer and insulator, any kind of change in this regulatory region can change the functionality of that protein.[10] The mutation in DNA can change the RNA sequence and then influence the RNA secondary structure, RNA binding protein recognition and miRNA binding activity.[11][12]

Functional annotation

This method mainly identifies variant function based on the information whether the variant loci are in the known functional region that harbor genomic or epigenomic signals. The function of non-coding variants are extensive in terms of the affected genomic region and they involve in almost all processes of gene regulation from transcriptional to post translational level [13]

Transcriptional gene regulation

Transcriptional gene regulation process depends on many spatial and temporal factors in the nucleus such as global or local chromatin states, nucleosome positioning, TF binding, enhancer/promoter activities. Variant that alter the function of any of these biological processes may alter the gene regulation and cause phenotypic abnormality.[14] Genetic variants that located in distal regulatory region can affect the binding motif of TFs, chromatin regulators and other distal transcriptional factors, which disturb the interaction between enhancer/silencer and its target gene.[15]

Alternative splicing

Alternative splicing is one of the most important components that show functional complexity of genome. Modified splicing has significant effect on the phenotype that is relevance to disease or drug metabolism. A change in splicing can be caused by modifying any of the components of the splicing machinery such as splice sites or splice enhancers or silencers.[16] Modification in the alternative splicing site can lead to a different protein form which will show a different function. Humans use an estimated 100,000 different proteins or more, so some genes must be capable of coding for a lot more than just one protein. Alternative splicing occurs more frequently than was previously thought and can be hard to control; genes may produce tens of thousands of different transcripts, necessitating a new gene model for each alternative splice.

RNA processing and post transcriptional regulation

Mutations in the untranslated region (UTR) affect many post-transcriptional regulation. Distinctive structural features are required for many RNA molecules and cis-acting regulatory elements to execute effective functions during gene regulation. SNVs can alter the secondary structure of RNA molecules and then disrupt the proper folding of RNAs, such as tRNA/mRNA/lncRNA folding and miRNA binding recognition regions.[17]

Translation and post translational modifications

Single nucleotide variant can also affect the cis-acting regulatory elements in mRNA’s to inhibit/promote the translation initiation. Change in the synonymous codons region due to mutation may affect the translation efficiency because of codon usage biases. The translation elongation can also be retarded by mutations along the ramp of ribosomal movement. In the post-translational level, genetic variants can contribute to proteostasis and amino acid modifications. However, mechanisms of variant effect in this field are complicated and there are only a few tools available to predict variant’s effect on translation related modifications.[18]

Protein function

Non-synonymous is the variant in exons that change the amino acid sequence encoded by the gene, including single base changes and non frameshift indels. It has been extremely investigated the function of non-synonymous variants on protein and many algorithms have been developed to predict the deleteriousness and pathogenesis of single nucleotide variants (SNVs). Classical bioinformatics tools, such as SIFT, Polyphen and MutationTaster, successfully predict the functional consequence of non-synonymous substitution.[19][20][21][22] PopViz webserver provides a gene-centric approach to visualize the mutation damage prediction scores (CADD, SIFT, PolyPhen-2) or the population genetics (minor allele frequency) versus the amino acid positions of all coding variants of a certain human gene.[23] PopViz is also cross-linked with UniProt database, where the protein domain information can be found, and to then identify the predicted deleterious variants fall into these protein domains on the PopViz plot.[23]

Evolutionary conservation and nature selection

Comparative genomics approaches were used to predict the function-relevant variants under the assumption that the functional genetic locus should be conserved across different species at an extensive phylogenetic distance. On the other hand, some adaptive traits and the population differences are driven by positive selections of advantageous variants, and these genetic mutations are functionally relevant to population specific phenotypes. Functional prediction of variants’ effect in different biological processes is pivotal to pinpoint the molecular mechanism of diseases/traits and direct the experimental validation.[24]

List of available SNP annotation tools

To annotate the vast amounts of available NGS data, currently a large number of SNPs annotation tools are available. Some of them are specific to specific SNPs while others are more general. Some of the available SNPs annotation tools are as follows SNPeff, Ensembl Variant Effect Predictor (VEP), ANNOVAR, FATHMM, PhD-SNP, PolyPhen-2, SuSPect, F-SNP, AnnTools, SeattleSeq, SNPit, SCAN, Snap, SNPs&GO, LS-SNP, Snat, TREAT, TRAMS, Maviant, MutationTaster, SNPdat, Snpranker, NGS – SNP, SVA, VARIANT, SIFT, LIST-S2, PhD-SNP and FAST-SNP. The functions and approaches used in SNPs annotation tools are listed below.

Tools Description External resources use WebsiteURL References
PhyreRisk Maps genetics variants onto experimental and predicted protein structures Variant effect predictor, UniProt, Protein Data Bank, SIFTS, Phyre2 for predicted structures http://phyrerisk.bc.ic.ac.uk/home

[25]

Missense3D Reports structural impact of a missense variant onto PDB and user-supplied protein coordinates. Developed to be applicable to experimental and predicted protein structures Protein Data Bank, Phyre2 for predicted structures http://www.sbg.bio.ic.ac.uk/~missense3d/

[26]

SNPeff SnpEff annotates variants based on their genomic locations and predicts coding effects. Uses an interval forest approach ENSEMBL, UCSC and organism based e.g. FlyBase, WormBase and TAIR http://snpeff.sourceforge.net/SnpEff_manual.html [27]
Ensembl VEP Determines effects of variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, proteins and regulatory regions dbSNP, RefSeq, UniProt, COSMIC, PDBe, 1000 Genomes, gnomAD, PubMed https://www.ensembl.org/info/docs/tools/vep/index.html [28]
ANNOVAR This tool is suitable for pinpointing a small subset of functionally important variants. Uses mutation prediction approach for annotation UCSC, RefSeq and Ensembl http://annovar.openbioinformatics.org/ [29]
Jannovar This is a tool and library for genome annotation RefSeq, Ensembl, UCSC, etc. https://github.com/charite/jannovar [30]
PhD-SNP SVM-based method using sequence information retrieved by BLAST algorithm. UniRef90 http://snps.biofold.org/phd-snp/ [31]
PolyPhen-2 Suitable for predicting damaging effects of missense mutations. Uses sequence conservation, structure to model position of amino acid substitution, and SWISS-PROT annotation UniProt http://genetics.bwh.harvard.edu/pph2/ [32]
MutationTaster Suitable for predicting damaging effects of all intragenic mutations (DNA and protein level), including InDels. Ensembl, 1000 Genomes Project, ExAC, UniProt, ClinVar, phyloP, phastCons, nnsplice, polyadq (...) http://www.mutationtaster.org/ [33]
SuSPect An SVM-trained predictor of the damaging effects of missense mutations. Uses sequence conservation, structure and network (interactome) information to model phenotypic effect of amino acid substitution. Accepts VCF file UniProt, PDB, Phyre2 for predicted structures, DOMINE and STRING for interactome http://www.sbg.bio.ic.ac.uk/suspect/index.html [34]
F-SNP Computationally predicts functional SNPs for disease association studies. PolyPhen, SIFT, SNPeffect, SNPs3D, LS-SNP, ESEfinder, RescueESE, ESRSearch, PESX, Ensembl, TFSearch, Consite, GoldenPath, Ensembl, KinasePhos, OGPET, Sulfinator, GoldenPath http://compbio.cs.queensu.ca/F-SNP/ [35]
AnnTools Design to Identify novel and SNP/SNV, INDEL and SV/CNV. AnnTools searches for overlaps with regulatory elements, disease/trait associated loci, known segmental duplications and artifact prone regions dbSNP, UCSC, GATK refGene, GAD, published lists of common structural genomic variation, Database of Genomic Variants, lists of conserved TFBs, miRNA http://anntools.sourceforge.net/ [36]
SNPit Analyses the potential functional significance of SNPs derived from genome wide association studies dbSNP, EntrezGene, UCSC Browser, HGMD, ECR Browser, Haplotter, SIFT -/- [37]
SNAP A neural network-based method for the prediction of the functional effects of non-synonymous SNPs Ensembl, UCSC, Uniprot, UniProt, Pfam, DAS-CBS, MINT, BIND, KEGG, TreeFam http://www.rostlab.org/services/SNAP [38]
SNPs&GO SVM-based method using sequence information, Gene Ontology annotation and when available protein structure. UniRef90, GO, PANTHER, PDB http://snps.biofold.org/snps-and-go/ [39]
LS-SNP Maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models UniProtKB, Genome Browser, dbSNP, PD http://www.salilab.org/LS-SNP [40]
TREAT TREAT is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing -/- http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm [41]
SNPdat Suitable for species non-specific or support non-model organism data. SNPdat does not require the creation of any local relational databases or pre-processing of any mandatory input files -/- https://code.google.com/p/snpdat/downloads/ [42]
NGS – SNP Annotate SNPs comparing the reference amino acid and the non-reference amino acid to each orthologue Ensembl, NCBI and UniProt http://stothard.afns.ualberta.ca/downloads/NGS-SNP/ [43]
SVA Predicted biological function to variants identified NCBI RefSeq, Ensembl, variation databases, UCSC, HGNC, GO, KEGG, HapMap, 1000 Genomes Project and DG http://www.svaproject.org/ [44]
VARIANT VARIANT increases the information scope outside the coding regions by including all the available information on regulation, DNA structure, conservation, evolutionary pressures, etc. Regulatory variants constitute a recognized, but still unexplored, cause of pathologies dbSNP,1000 genomes, disease-related variants from GWAS, OMIM, COSMIC http://variant.bioinfo.cipf.es/ [45]
SIFT SIFT is a program that predicts whether an amino acid substitution affects protein function. SIFT uses sequence homology to predict whether an amino acid substitution will affect protein function PROT/TrEMBL, or NCBI's http://blocks.fhcrc.org/sift/SIFT.html [46]
LIST-S2 LIST-S2 (Local Identity and Shared Taxa, Species-specific) is based on the assumption that variations observed in closely related species are more significant when assessing conservation compared to those in distantly related species UniProt SwissProt/TrEMBL and NCBI Taxonomy https://gsponerlab.msl.ubc.ca/software/list/ [47][48]
FAST-SNP A web server that allows users to efficiently identify and prioritize high-risk SNPs according to their phenotypic risks and putative functional effects NCBI dbSNP, Ensembl, TFSearch, PolyPhen, ESEfinder, RescueESE, FAS-ESS, SwissProt, UCSC Golden Path, NCBI Blast and HapMap http://fastsnp.ibms.sinica.edu.tw/ [49]
PANTHER PANTHER relate protein sequence evolution to the evolution of specific protein functions and biological roles. The source of protein sequences used to build the protein family trees and used a computer-assisted manual curation step to better define the protein family clusters STKE, KEGG, MetaCyc, FREX and Reactome http://www.pantherdb.org/ [50]
Meta-SNP SVM-based meta predictor including 4 different methods. PhD-SNP, PANTHER, SIFT, SNAP http://snps.biofold.org/meta-snp [51]
PopViz Integrative and interactive gene-centric visualization of population genetics and mutation damage prediction scores of human gene variants gnomAD, Ensembl, UniProt, OMIM, UCSC, CADD, EIGEN, LINSIGHT, SIFT, PolyPhen-2, http://shiva.rockefeller.edu/PopViz/ [23]

Algorithms used in annotation tools

Variant annotation tools use machine learning algorithms to predict variant annotations. Different annotation tools use different algorithms. Common algorithms include:

  • Interval/Random forest-eg.MutPred, SNPeff
  • Neural networks-eg.SNAP
  • Support Vector Machines-e.g. PhD-SNP, SNPs&GO
  • Bayesian classification-eg.PolyPhen-2

Comparison of variant annotation tools

A large number of variant annotation tools are available for variant annotation. The annotation by different tools does not alway agree amongst each other, as the defined rules for data handling differ between applications. It is frankly impossible to perform a perfect comparison of the available tools. Not all tools have the same input and output nor the same functionality. Below is a table of major annotation tools and their functional area.

Tools Input file Output file SNP INDEL CNV WEB or Program Source
AnnoVar VCF, pileup,

CompleteGenomics, GFF3-SOLiD, SOAPsnp, MAQ, CASAVA

TXT Yes Yes Yes Program [52]
Jannovar VCF VCF Yes Yes Yes Java Program [53]
SNPeff VCF, pileup/TXT VCF, TXT, HTML Yes Yes No Program [27]
Ensembl VEP Ensembl default (coordinates), VCF, variant identifiers, HGVS, SPDI, REST-style regions VCF, VEP, TXT, JSON Yes Yes Yes Web, Perl script, REST API [54]
AnnTools VCF, pileup, TXT VCF Yes Yes No No [55]
SeattleSeq VVCF, MAQ, CASAVA,

GATK BED

VCF, SeattleSeq Yes Yes No Web [56]
VARIANT VCF, GFF2, BED web report, TXT Yes Yes Yes Web [57]

[58]

Application

Different annotations capture diverse aspects of variant function.[59] Simultaneous use of multiple, varied functional annotations could improve rare variants association analysis power of whole exome and whole genome sequencing studies.[60] Some tools have been developed to enable functionally-informed phenotype-genotype association analysis for common and rare variants by incorporating functional annotations in biobank-scale cohorts. [61][62][63][64]

Conclusions

The next generation of SNP annotation webservers can take advantage of the growing amount of data in core bioinformatics resources and use intelligent agents to fetch data from different sources as needed. From a user’s point of view, it is more efficient to submit a set of SNPs and receive results in a single step, which makes meta-servers the most attractive choice. However, if SNP annotation tools deliver heterogeneous data covering sequence, structure, regulation, pathways, etc., they must also provide frameworks for integrating data into a decision algorithms, and quantitative confidence measures so users can assess which data are relevant and which are not.

References

  1. "Genome annotation". Plant Physiol. Biochem 29 (3–4): 181–193. 2001. doi:10.1016/S0981-9428(01)01242-6. 
  2. "Next generation tools for the annotation of human SNPs". Briefings in Bioinformatics 10 (1): 35–52. January 2009. doi:10.1093/bib/bbn047. PMID 19181721. 
  3. "SNPit: a federated data integration system for the purpose of functional SNP annotation". Computer Methods and Programs in Biomedicine 95 (2): 181–189. August 2009. doi:10.1016/j.cmpb.2009.02.010. PMID 19327864. 
  4. N. C. Oraguzie, E.H.A. Rikkerink, S.E. Gardiner, H.N. de Silva (eds.), "Association Mapping in Plants", Springer, 2007
  5. "Bioinformatics for personal genome interpretation". Briefings in Bioinformatics 13 (4): 495–512. July 2012. doi:10.1093/bib/bbr070. PMID 22247263. 
  6. P. H. Lee, H. Shatkay, “Ranking single nucleotide polymorphisms by potential deleterious effects”, Computational Biology and Machine Learning Lab, School of Computing, Queen’s University, Kingston, ON, Canada
  7. "Single-nucleotide polymorphism" (in en), Wikipedia, 2019-08-12, https://en.wikipedia.org/w/index.php?title=Single-nucleotide_polymorphism&oldid=910435891, retrieved 2019-09-03 
  8. "Minor allele frequency" (in en), Wikipedia, 2019-08-12, https://en.wikipedia.org/w/index.php?title=Minor_allele_frequency&oldid=910435916, retrieved 2019-09-03 
  9. M. J. Li, J. Wang, "Current trend of annotating single nucleotide variation in humans – A case study on SNVrap", Elsevier, 2014, pp. 1–9
  10. "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews. Genetics 10 (1): 57–63. January 2009. doi:10.1038/nrg2484. PMID 19015660. 
  11. "Disease-associated mutations that alter the RNA structural ensemble". PLOS Genetics 6 (8): e1001074. August 2010. doi:10.1371/journal.pgen.1001074. PMID 20808897. 
  12. "Landscape and variation of RNA secondary structure across the human transcriptome". Nature 505 (7485): 706–709. January 2014. doi:10.1038/nature12946. PMID 24476892. Bibcode2014Natur.505..706W. 
  13. "Understanding the contribution of synonymous mutations to human disease". Nature Reviews. Genetics 12 (10): 683–691. August 2011. doi:10.1038/nrg3051. PMID 21878961. 
  14. "Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression". Briefings in Bioinformatics 16 (3): 393–412. May 2015. doi:10.1093/bib/bbu018. PMID 24916300. 
  15. "Functional variants at the 11q13 risk locus for breast cancer regulate cyclin D1 expression through long-range enhancers". American Journal of Human Genetics 92 (4): 489–503. April 2013. doi:10.1016/j.ajhg.2013.01.002. PMID 23540573. 
  16. "Genome-wide prediction of splice-modifying SNPs in human genes using a new analysis pipeline called AASsites". BMC Bioinformatics 12 (Suppl 4): S2. 2011. doi:10.1186/1471-2105-12-s4-s2. PMID 21992029. 
  17. "Human disease-associated genetic variation impacts large intergenic non-coding RNA expression". PLOS Genetics 9 (1): e1003201. 2013. doi:10.1371/journal.pgen.1003201. PMID 23341781. 
  18. M. J. Li, J. Wang, "Current trend of annotating single nucleotide variation in humans – A case study on SNVrap", Elsevier, 2014, pp. 1–9
  19. J. Wu, R. Jiang, "Prediction of Deleterious Nonsynonymous Single-Nucleotide Polymorphism for Human Diseases", The Scientific World Journal, 2013, 10 pages
  20. "SIFT web server: predicting effects of amino acid substitutions on proteins". Nucleic Acids Research 40 (Web Server issue): W452–W457. July 2012. doi:10.1093/nar/gks539. PMID 22689647. 
  21. "A method and server for predicting damaging missense mutations". Nature Methods 7 (4): 248–249. April 2010. doi:10.1038/nmeth0410-248. PMID 20354512. 
  22. "MutationTaster evaluates disease-causing potential of sequence alterations". Nature Methods 7 (8): 575–576. August 2010. doi:10.1038/nmeth0810-575. PMID 20676075. 
  23. 23.0 23.1 23.2 "PopViz: a webserver for visualizing minor allele frequencies and damage prediction scores of human genetic variations". Bioinformatics 34 (24): 4307–4309. December 2018. doi:10.1093/bioinformatics/bty536. PMID 30535305. 
  24. M. J. Li, J. Wang, "Current trend of annotating single nucleotide variation in humans – A case study on SNVrap", Elsevier, 2014, pp. 1–9
  25. "PhyreRisk: A Dynamic Web Application to Bridge Genomics, Proteomics and 3D Structural Data to Guide Interpretation of Human Genetic Variants". Journal of Molecular Biology 431 (13): 2460–2466. June 2019. doi:10.1016/j.jmb.2019.04.043. PMID 31075275. 
  26. "Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated?". Journal of Molecular Biology 431 (11): 2197–2212. May 2019. doi:10.1016/j.jmb.2019.04.009. PMID 30995449. 
  27. 27.0 27.1 "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3". Fly 6 (2): 80–92. 2012. doi:10.4161/fly.19695. PMID 22728672. 
  28. "The Ensembl Variant Effect Predictor". Genome Biology 17 (1): 122. June 2016. doi:10.1186/s13059-016-0974-4. PMID 27268795. 
  29. "ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data". Nucleic Acids Research 38 (16): e164. September 2010. doi:10.1093/nar/gkq603. PMID 20601685. 
  30. "Jannovar: a java library for exome annotation". Human Mutation 35 (5): 548–555. May 2014. doi:10.1002/humu.22531. PMID 24677618. 
  31. "Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information". Bioinformatics 22 (22): 2729–2734. November 2006. doi:10.1093/bioinformatics/btl423. PMID 16895930. 
  32. "Predicting functional effect of human missense mutations using PolyPhen-2". Current Protocols in Human Genetics Chapter 7: Unit7.20. January 2013. doi:10.1002/0471142905.hg0720s76. PMID 23315928. 
  33. "MutationTaster evaluates disease-causing potential of sequence alterations". Nature Methods 7 (8): 575–576. August 2010. doi:10.1038/nmeth0810-575. PMID 20676075. 
  34. "SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features". Journal of Molecular Biology 426 (14): 2692–2701. July 2014. doi:10.1016/j.jmb.2014.04.026. PMID 24810707. 
  35. "F-SNP: computationally predicted functional SNPs for disease association studies". Nucleic Acids Research 36 (Database issue): D820–D824. January 2008. doi:10.1093/nar/gkm904. PMID 17986460. 
  36. "AnnTools: a comprehensive and versatile annotation toolkit for genomic variants". Bioinformatics 28 (5): 724–725. March 2012. doi:10.1093/bioinformatics/bts032. PMID 22257670. 
  37. "SNPit: a federated data integration system for the purpose of functional SNP annotation". Computer Methods and Programs in Biomedicine 95 (2): 181–189. August 2009. doi:10.1016/j.cmpb.2009.02.010. PMID 19327864. 
  38. "SNAP: predict effect of non-synonymous polymorphisms on function". Nucleic Acids Research 35 (11): 3823–3835. 2007. doi:10.1093/nar/gkm238. PMID 17526529. 
  39. "Functional annotations improve the predictive score of human disease-related mutations in proteins". Human Mutation 30 (8): 1237–1244. August 2009. doi:10.1002/humu.21047. PMID 19514061. 
  40. "LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources". Bioinformatics 21 (12): 2814–2820. June 2005. doi:10.1093/bioinformatics/bti442. PMID 15827081. 
  41. "TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data". Bioinformatics 28 (2): 277–278. January 2012. doi:10.1093/bioinformatics/btr612. PMID 22088845. 
  42. "Snpdat: easy and rapid annotation of results from de novo snp discovery projects for model and non-model organisms". BMC Bioinformatics 14: 45. February 2013. doi:10.1186/1471-2105-14-45. PMID 23390980. 
  43. "In-depth annotation of SNPs arising from resequencing projects using NGS-SNP". Bioinformatics 27 (16): 2300–2301. August 2011. doi:10.1093/bioinformatics/btr372. PMID 21697123. 
  44. "SVA: software for annotating and visualizing sequenced human genomes". Bioinformatics 27 (14): 1998–2000. July 2011. doi:10.1093/bioinformatics/btr317. PMID 21624899. 
  45. "VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing". Nucleic Acids Research 40 (Web Server issue): W54–W58. July 2012. doi:10.1093/nar/gks572. PMID 22693211. 
  46. "SIFT: Predicting amino acid changes that affect protein function". Nucleic Acids Research 31 (13): 3812–3814. July 2003. doi:10.1093/nar/gkg509. PMID 12824425. 
  47. "Improved measures for evolutionary conservation that exploit taxonomy distances". Nature Communications 10 (1): 1556. April 2019. doi:10.1038/s41467-019-09583-2. PMID 30952844. Bibcode2019NatCo..10.1556M. 
  48. "LIST-S2: taxonomy based sorting of deleterious missense mutations across species". Nucleic Acids Research 48 (W1): W154–W161. July 2020. doi:10.1093/nar/gkaa288. PMID 32352516. 
  49. "FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization". Nucleic Acids Research 34 (Web Server issue): W635–W641. July 2006. doi:10.1093/nar/gkl236. PMID 16845089. 
  50. "PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways". Nucleic Acids Research 35 (Database issue): D247–D252. January 2007. doi:10.1093/nar/gkl869. PMID 17130144. 
  51. "Collective judgment predicts disease-associated single nucleotide variants". BMC Genomics 14 (Suppl 3): S2. 2013. doi:10.1186/1471-2164-14-S3-S2. PMID 23819846. 
  52. "ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data". Nucleic Acids Research 38 (16): e164. September 2010. doi:10.1093/nar/gkq603. PMID 20601685. 
  53. "charite/jannovar". https://github.com/charite/jannovar. 
  54. "The Ensembl Variant Effect Predictor". Genome Biology 17 (1): 122. June 2016. doi:10.1186/s13059-016-0974-4. PMID 27268795. 
  55. "AnnTools: a comprehensive and versatile annotation toolkit for genomic variants". Bioinformatics 28 (5): 724–725. March 2012. doi:10.1093/bioinformatics/bts032. PMID 22257670. 
  56. "Input Variation List File for Annotation". SeattleSeq Annotation 151. http://snp.gs.washington.edu/SeattleSeqAnnotation. 
  57. "VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing". Nucleic Acids Research 40 (Web Server issue): W54–W58. July 2012. doi:10.1093/nar/gks572. PMID 22693211. 
  58. "A survey of tools for variant analysis of next-generation genome sequencing data". Briefings in Bioinformatics 15 (2): 256–278. March 2014. doi:10.1093/bib/bbs086. PMID 23341494. 
  59. "Principles and methods of in-silico prioritization of non-coding regulatory variants". Human Genetics 137 (1): 15–30. January 2018. doi:10.1007/s00439-017-1861-0. PMID 29288389. 
  60. "Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale". Nature Genetics 52 (9): 969–983. September 2020. doi:10.1038/s41588-020-0676-4. PMID 32839606. 
  61. "Functional mapping and annotation of genetic associations with FUMA". Nature Communications 8 (1): 1826. November 2017. doi:10.1038/s41467-017-01261-5. PMID 29184056. 
  62. "A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies". Nature Methods 19 (12): 1599–1611. December 2022. doi:10.1038/s41592-022-01640-x. PMID 36303018. 
  63. "STAARpipeline: an all-in-one rare-variant tool for biobank-scale whole-genome sequencing data". Nature Methods 19 (12): 1532–1533. December 2022. doi:10.1038/s41592-022-01641-w. PMID 36316564. 
  64. Li, Xihao; Quick, Corbin; Zhou, Hufeng; Gaynor, Sheila M.; Liu, Yaowu; Chen, Han; Selvaraj, Margaret Sunitha; Sun, Ryan et al. (January 2023). "Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies". Nature Genetics 55 (1): 154–164. doi:10.1038/s41588-022-01225-6. PMID 36564505.