Biology:Helicos single molecule fluorescent sequencing

From HandWiki

The Helicos Genetic Analysis System platform was the first commercial NGS (Next Generation Sequencing) implementation to use the principle of single molecule fluorescent sequencing, a method of identifying the exact sequence of a piece of DNA. It was marketed by the now defunct Helicos Biosciences.

The fragments of DNA molecules are first hybridized in place on disposable glass flow cells. Fluorescent nucleotides are then added one-by-one, with a terminating nucleotide used to pause the process until an image has been captured. From the image, one nucleotide from each DNA sequence can be determined. The fluorescent molecule is then cut away, and the process is repeated until the fragments have been completely sequenced.[1]

This sequencing method and equipment were used to sequence the genome of the M13 bacteriophage.[2]

Preparing the DNA

Fragmenting the DNA

The Helicos Genetic Analysis System is capable of sequencing nucleic acids, from several nucleotides to several thousand nucleotides. However, the yield of sequences per unit mass is dependent on the number of 3’ end hydroxyl groups, and thus having relatively short templates for sequencing is more efficient than having long templates. Helicos recommends a length less than 1000nt (nucleotides), optimally about 100-200nt. Long fragments can be cleaved by shearing the DNA (the recommended approach), or restriction enzymes. Short fragments are removed to improve yield.[3]

Tailing

DNA samples are hybridized to a primer immobilized on a flow cell for sequencing, so it is usually necessary to generate a nucleic acid with an end compatible for hybridization to those surfaces. The target sequence attached to the flow cell surface could, in theory, be any sequence which can be synthesized, but, in practice, the standard commercially available flow cell is oligo(dT)50. To be compatible with the oligo(dT)50 primer on the flow cell surface, it is necessary to generate a poly(dA) tail of at least 50 nt at the 3’ end of the molecule to be sequenced. Because the fill and lock step will fill in excess A’s but not excess T’s, it is desirable for the A tail to be at least as long as oligo(dT) on the surface. Generation of a 3’ poly(dA) tail can be accomplished with a variety of different ligases or polymerases. If there is sufficient DNA to measure both mass and average length, it is possible to determine the proper amount of dATP to be added to generate poly(dA) tails 90 to 200 nucleotides long. To generate tails of this length, it is first necessary to estimate how many 3’ ends there are in the sample and then use the right ratio of DNA, dATP, and terminal transferase to obtain the optimal size range of tails.[4]

Blocking

If the tailed DNA targeted for sequencing is hybridized to the flow cell directly after tailing, it would have a free 3’ hydroxyl that could be extended in the sequencing reaction just like the surface-bound primer and potentially confuse the sequence determination. Thus, prior to sequencing, it is also necessary to block the 3’ ends of the molecules to be sequenced. Any 3’ end treatment that makes the molecule unsuitable for extension can be used. Typically, tailed molecules are blocked using terminal transferase and a dideoxynucleotide, but any treatment that leaves a 3’ phosphate or other modification that prevents extension can be similarly effective.[5]

DNA sequencing

Sample loading

The single molecule fluorescent sequencing is carried out on a glass flow cell with 25 channels for the same or different samples. The system can be run with either one or two flow cells at a time. In the standard configuration, each channel is equivalent and holds approximately 8 μl. Samples are generally loaded with higher volume (usually 20 μl or more) to ensure even hybridization along the length of the flow cell. Samples are inserted into the flow cell via the sample loader included with the overall system. Each channel is individually addressable, and sample is applied using a vacuum. Hybridization to the flow cell is typically carried out at 55◦C for 1 hr.[6]

Filling and locking

Generally, samples for sequencing are prepared in such a way that the poly(A) tail is longer than the oligo(dT)50 on the surface of the flow cell. To avoid sequencing the unpaired A residues, a fill and lock treatment is needed. After hybridization, the temperature is lowered to 37◦C, and then dTTP and Virtual Terminator nucleotides [7] corresponding to dATP, dCTP, and dGTP are added along with DNA polymerase. Virtual terminator nucleotides incorporate opposite the complementary base and prevent further incorporation because of the chemical structure appended to the nucleotide. Thus, all of the unpaired dAs present in the poly(A) tail are filled in with TTP. The hybridized molecule is locked in place when the polymerase encounters the first non-A residue and inserts the appropriate virtual terminator nucleotide. Because every DNA molecule should now have a dye attached, an image will include all molecules capable of nucleotide incorporation. Also, because the label could correspond to any base, no sequence information is obtained at this stage. Thus, for most molecules, sequencing commences with the second base of the original molecule.[8]

Sequencing

Chemistry cycle

In order to sequence the hybridized DNAs, it is first necessary to cleave off the fluorescent dye and terminator moieties present on the virtual terminator nucleotides. The current generation of nucleotides is synthesized with a disulfide linkage that can be rapidly and completely cleaved. Following cleavage, the now-separated fluorescent dyes are washed away and then new polymerase and a single fluorescent nucleotide are added. After excitation of the fluorescent moiety by the system laser, another image is taken, and, on a standard sequencing run, this cyclic process is repeated 120 times. The number of sequencing cycles is user adjustable and can be modified depending on user needs for run time and length of read. During a standard run, two 25-channel flow cells are used, with each flow cell alternating between the chemistry cycle and the imaging cycle.[9]

Imaging cycle

During the imaging process, four lasers illuminate 1100 Fields of View (FOV) per channel with pictures taken by four CCD (Charge-coupled device) cameras via a confocal microscope. Though single molecules are visualized, multiple photon emissions are registered for each molecule, with the time spent at each FOV dependent on the brightness of the dye in the particular nucleotide as well as camera speed and detection efficiency. At the present time, the imaging process is the rate-determining step, and run time could be reduced at the expense of throughput by reducing the number of FOV per channel.[10]

Throughput

Under optimal conditions, for a standard 120-cycle, 1100 field-of view run, 12,000,000 to 20,000,000 reads that are 25 nucleotides or longer and align to the reference genome should be expected from each channel, for a total of up to 1,000,000,000 aligned reads and 35 Gb of sequence from each run. A full run takes up to 8 days to complete.[citation needed]

Advantages and disadvantages

  • The single molecule sequencing strategy simplifies the DNA sample preparation process, avoids PCR-induced bias and errors, simplifies data analysis and tolerates degraded samples
  • Because the process is halted between each extension step, the time to sequence a single nucleotide is high, and the read lengths realized are 32 nucleotides long.
  • The error rate is high due to noise. This can be overcome with repetitive sequencing, but increases the cost per base for a given accuracy rate, offsetting some of the gains from lower reagent costs. The raw read error rates are generally at 5%, although the highly parallel nature of this technology can deliver high fold coverage and a consensus or finished read accuracy of 99%.[11]

See also

References

  1. Thompson JF, Steinmann KE. 2010 Single Molecule Sequencing with a HeliScope Genetic Analysis System. Curr Protoc Mol Biol. Chapter 7:Unit7.10.
  2. "Single-molecule DNA sequencing of a viral genome.". Science 320 (5872): 106–9. 4 Apr 2008. doi:10.1126/science.1150427. PMID 18388294. Bibcode2008Sci...320..106H. 
  3. Morozova, Olena (2008). "Applications of next-generation sequencing technologies in functional genomics". Genomics 92 (5): 255–264. doi:10.1016/j.ygeno.2008.07.001. PMID 18703132. 
  4. Bowers, Jayson (2009). "Virtual terminator nucleotides for next-generation DNA sequencing". Nature Methods 6 (8): 593–595. doi:10.1038/nmeth.1354. PMID 19620973. 
  5. Mamanova, Lira (2010). "Target-enrichment strategies for next-generation sequencing". Nat Methods 7 (2): 111–118. doi:10.1038/nmeth.1419. PMID 20111037. 
  6. Brady, J (2011). "Optimum cell hybridization conditions in Helicos-based next-generation sequencing". Journal of Biomolecular Technology 24 (5): 211–230. 
  7. Bowers, J.; Mitchell, J.; Beer, E.; Buzby, P.R.; Causey, M.; Efcavitch, J.W.; Jarosz, M.; Krzymanska-Olejnik, E. et al. (2009). "Virtual terminator nucleotides for next-generation DNA sequencing". Nat. Methods 6 (8): 593–595. doi:10.1038/nmeth.1354. PMID 19620973. 
  8. Glenn, T (2011). "Field guide to next‐generation DNA sequencers". Molecular Ecology Resources 11 (5): 759–769. doi:10.1111/j.1755-0998.2011.03024.x. PMID 21592312. 
  9. Buermans, Dunnen (2014). "Next generation sequencing technology: advances and applications". Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease 1842 (10): 1932–1941. doi:10.1016/j.bbadis.2014.06.015. PMID 24995601. 
  10. Buermans, Dunnen (2014). "Next generation sequencing technology: advances and applications". Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease 1842 (10): 1932–1941. doi:10.1016/j.bbadis.2014.06.015. PMID 24995601. 
  11. Crosetto (2013). "Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing". Nature Methods 10 (4): 361–5. doi:10.1038/nmeth.2408. PMID 23503052. 

External links