Sequence graph

From HandWiki
Revision as of 19:48, 26 May 2022 by imported>CodeMe (fix)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Graph in comparative genomics

In comparative genomics, a sequence graph, also called an alignment graph, breakpoint graph, or adjacency graph, is a bidirected graph in which the vertices represent segments of DNA and the edges represent adjacency between segments in a genome.[1] The segments are labeled by the DNA string they represent, and each edge connects the tail end of one segment with the head end of another segment. Each adjacency edge is labelled by a (possibly empty) string of DNA. Traversing a connected component of segments and adjacency edges (called a thread) yields a sequence, which typically represents a genome or a section of a genome. The segments can be thought of as synteny blocks, with the edges dictating how to arrange these blocks in a particular genome, and the labelling of the adjacency edges representing bases that are not contained in synteny blocks.

Applications

Multiple sequence alignment

Sequence graphs can be used to represent multiple sequence alignments with the addition of a new kind of edge representing homology between segments.[2] For a set of genomes, one can create an acyclic breakpoint graph with a thread for each genome. For two segments [math]\displaystyle{ (a, b) }[/math] and [math]\displaystyle{ (c,d) }[/math], where [math]\displaystyle{ a }[/math],[math]\displaystyle{ b }[/math],[math]\displaystyle{ c }[/math], and [math]\displaystyle{ d }[/math] represent the endpoints of the two segments, homology edges can be created from [math]\displaystyle{ a }[/math] to [math]\displaystyle{ c }[/math] and [math]\displaystyle{ b }[/math] to [math]\displaystyle{ d }[/math] or from [math]\displaystyle{ a }[/math] to [math]\displaystyle{ d }[/math] and [math]\displaystyle{ b }[/math] to [math]\displaystyle{ c }[/math] - representing the two possible orientations of the homology. The advantage of representing a multiple sequence alignment this way is that it is possible to include inversions and other structural rearrangements that wouldn't be allowable in a matrix representation.

Representing variation

If there are multiple possible paths when traversing a thread in a sequence graph, multiple sequences can be represented by the same thread. This means it is possible to create a sequence graph that represents a population of individuals with slightly different genomes - with each genome corresponding to one path through the graph. These graphs have been proposed as a replacement for the reference human genome.[3]

References

  1. Alekseyev, M. A.; Pevzner, P. A. (2009-02-13). "Breakpoint graphs and ancestral genome reconstructions". Genome Research (Cold Spring Harbor Laboratory) 19 (5): 943–957. doi:10.1101/gr.082784.108. ISSN 1088-9051. PMID 19218533. 
  2. Paten, Benedict; Zerbino, Daniel R; Hickey, Glenn; Haussler, David (2014-06-19). "A unifying model of genome evolution under parsimony". BMC Bioinformatics (Springer Science and Business Media LLC) 15 (1): 206. doi:10.1186/1471-2105-15-206. ISSN 1471-2105. PMID 24946830. 
  3. Paten, Benedict; Novak, Adam; Haussler, David (2014-04-20). "Mapping to a Reference Genome Structure". arXiv:1404.5010 [q-bio.GN].