Software:MUMmer

From HandWiki

MUMmer is a bioinformatics software system for sequence alignment. It is based on the suffix tree data structure. It has been used for comparing different genomes assemblies to one another, which allows scientists to determine how a genome has changed. The acronym "MUMmer" comes from "Maximal Unique Matches", or MUMs.

The original algorithms in the MUMMER software package were designed by Art Delcher, Simon Kasif and Steven Salzberg. Mummer was the first whole genome comparison system developed in Bioinformatics. It was originally applied to the comparison of two related strains of bacteria.

The MUMmer software is open source. The system is maintained primarily by Steven Salzberg and Arthur Delcher at Center for Computational Biology at Johns Hopkins University.

MUMmer is a highly cited bioinformatics system in the scientific literature. According to Google Scholar, as of early 2013 the original MUMmer paper (Delcher et al., 1999)[1] has been cited 691 times; the MUMmer 2 paper (Delcher et al., 2002)[2] has been cited 455 times; and the MUMmer 3.0 article (Kurtz et al., 2004)[3] has been cited 903 times.

Overview

Mummer is a fast algorithm used for the rapid alignment of entire genomes. The MUMmer algorithm is relatively new and has 4 versions.

Versions of MUMmers

MUMmer1

MUMmer1 or just MUMmer consists of three parts, the first part consists of the creation of suffix trees (to get MUMs), the second part in the longest increasing subsequence or longest common subsequences (to order MUMs), lastly any alignment to close gaps.

Interruptions between MUMs-alignment, are known as gaps. Otherther alignment algorithms fill these gaps. The gaps fall in the following four classes:[4]

  • An SNPinterruption – when comparing two sequences, one character will differ.
  • An insertion – when comparing two sequences, there is a subsequence in only appears in one of the sequences. It would be an empty gap in the other sequence at the moment of comparison of the two sequences.
  • A highly polymorphic region – when comparing two sequences, there can be found a subsequence in which every single character differs.
  • A repeat – it’s the repetition of a sequence. Since MUMs can only take unique sequences, that gap can be one repetition of one of the MUMs.

MUMmer 2

This algorithm was redesigned to require less memory and increase speed and accuracy. It also allows for bigger genomes alignment.

The improvement was the amount stored in the suffix trees by employing the one created by Kurtz.

MUMmer 3

According to Stefan Kurtz and his teammates, “the most significant technical improvement in MUMmer 3.0, is a complete rewrite of the suffix-tree code, based on the compact suffix- tree representation of” [5] the tree described in the article “Reducing the space requirement of suffix trees”.[6]

MUMmer 4

According to Guillaume and his team, there are some extra improvements in the implementation and also innovation with Query parallelism. “MUMmer4 now includes options to save and load the suffix array for a given reference."[7] This allows the suffix tree can be built once and constructed again after running it from the saved suffix tree.

Software - Open Source

MUMmer has open-source software and can be accessed online.

Related Sequence Alignments

There are other types of sequence alignments:

  • Edit distance
  • BLAST
  • Bowtie
  • BWA
  • Blat
  • Mauve
  • LASTZ
  • BLAST

References

  1. Delcher, A. L.; Kasif, S.; Fleischmann, R. D.; Peterson, J.; White, O.; Salzberg, S. L. (1999). "Alignment of whole genomes". Nucleic Acids Research 27 (11): 2369–2376. doi:10.1093/nar/27.11.2369. PMID 10325427. 
  2. Delcher, A. L.; Phillippy, A.; Carlton, J.; Salzberg, S. L. (2002). "Fast algorithms for large-scale genome alignment and comparison". Nucleic Acids Research 30 (11): 2478–2483. doi:10.1093/nar/30.11.2478. PMID 12034836. 
  3. Delcher, A.; Harmon, D.; Kasif, S.; White, O.; Salzberg, S. (1999). "Improved microbial gene identification with GLIMMER". Nucleic Acids Research 27 (23): 4636–4641. doi:10.1093/nar/27.23.4636. PMID 10556321. 
  4. Delcher, A.; Kasif, S.; Fleischmann, R.; Peterson, J.; White, O.; Salzberg, S. (1999). "Alignment of Whole Genomes". Nucleic Acids Research 27 (11): 2369–2376. doi:10.1093/nar/27.23.4636. PMID 10325427. 
  5. Kurtz, S.; Phillippy, A.; Delcher, A.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S. (2004). "Versatile and open software for comparing large genomes". Genome Biology 5 (2): R12. doi:10.1186/gb-2004-5-2-r12. PMID 14759262. PMC 395750. http://gensoft.pasteur.fr/docs/MUMmer/3.22/MUMmer3.pdf. Retrieved 2021-05-06. 
  6. Kurtz, S. (1999). "Reducing the Space Requirement of Suffix Trees". Software: Practice and Experience 29 (13): 1149–1171. doi:10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O. https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-024X(199911)29:13%3C1149::AID-SPE274%3E3.0.CO;2-O. Retrieved 2021-05-06. 
  7. Marçais, Guillaume.; Pillippy, A.; Delcher, A.; Coston, R.; Salzberg, S.; Zimin, A. (2018). "MUMmer4: A fast and versatile genome alignment system". PLOS Computational Biology 14 (1): e1005944. doi:10.1371/journal.pcbi.1005944. PMID 29373581. Bibcode2018PLSCB..14E5944M. 

External links