Biology:ScGET-seq

From HandWiki
Short description: Single-cell sequencing technology


Single-cell genome and epigenome by transposases sequencing (scGET-seq) is a DNA sequencing method for profiling open and closed chromatin. In contrast to single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), which only targets active euchromatin.[1] scGET-seq is also capable of probing inactive heterochromatin.[2]

This is achieved through the use of TnH, which is created by linking the chromodomain (CD) of heterochromatin protein-1-alpha (HP-1[math]\displaystyle{ \alpha }[/math]) to the Tn5 transposase. TnH is then able to target histone 3 lysine 9 trimethylation (H3K9me3), a marker for heterochromatin.[3]

Akin to RNA velocity, which uses the ratio of spliced to unspliced RNA to infer the kinetics of changes in gene expression over the course of cellular development,[4] the ratio of TnH to Tn5 signals obtained from scGET-seq can be used to calculate chromatin velocity, which measures the dynamics of chromatin accessibility over the course of cellular developmental pathways.[2]

History

Transcriptional regulation is tightly linked to chromatin states. Chromatin that is open, or permissive to transcription, make up only 2-3% of the genome, but encompass 94.4% of transcription factor binding sites.[5][6] Conversely, more tightly packed DNA, or heterochromatin, is responsible for genome organization and stability.[7] Chromatin density also changes over the course of cellular differentiation processes,[8] but there is a lack of high-throughput sequencing methods for directly assaying heterochromatin.

Many genomic-related diseases such as cancer are highly linked to changes in their epigenome. Cancers in particular are characterized by single-cell heterogeneity, which can drive metastasis and treatment resistance.[9][10]  The mechanisms that underlie these processes are still largely unknown, although the advent of single-cell technologies, including single-cell epigenomics, has contributed greatly to their elucidation.[11]

In 2015, ATAC-seq, which uses the Tn5 transposase to fragment and tag accessible chromatin, or euchromatin, for sequencing, became feasible at the single-cell resolution.[12] scGET-seq builds upon this technology by also providing information on heterochromatin, providing a more comprehensive look at chromatin structure and dynamics within each cell.[13]

Methods

Broad overview of how scGET-seq is performed

Sample preparation

Sample preparation for scGET-seq starts with obtaining a suspension of nuclei from cells using a method appropriate for the starting material.[14]

The next step is to produce the TnH transposase. Tn5 is a transposase that cuts and ligates adapters to genomic regions unbound by nucleosomes (open chromatin).[15] HP-1a is a member of the HP1 family and is able to recognize and specifically bind to H3K9me3.[16][17] Its chromodomain uses an induced-fit mechanism for recognizing this chromatin modification.[18] Linking the first 112 amino acids of HP-1a containing the chromodomain to Tn5 using a three poly-tyrosine-glycine-serine (TGS) linker leads to the creation of the TnH transposase, which is capable of targeting heterochromatin marked by H3K9me3.[2]

Library preparation is done using a modified protocol for single-cell ATAC-seq,[19] where the nuclei suspension is sequentially incubated with the Tn5 transposase first, and then TnH.[2]

Data analysis

The goals of the data analysis are:[2]

  1. To identify and characterize distinct cell populations using clustering
  2. To profile chromatin accessibility across the genome
  3. To predict copy-number variants and single-nucleotide variants

Pre-processing

  1. Post-sequencing, reads need to be demultiplexed and mapped to the appropriate reference genome. Duplicated reads are identified and removed.
  2. "Peaks", or regions in the DNA enriched in the number of reads mapped, are identified.[20]
  3. Quality control is performed, and cells with low numbers of reads or few detected features are filtered out.
  4. Four count matrices (matrices where each column is a cell and each row is a feature) are generated: Tn5-dhs, Tn5-complement, TnH-dhs and TnH-complement, representing signal from accessible and compacted chromatin.[2]

Analysis

Dimension reduction, visualization and clustering

Each of the matrices are filtered of shared regions and then normalized and log2 transformed. Linear dimension reduction is done using principal component analysis (PCA). Groups of cells are identified using a k-NN algorithm[21] and Leiden algorithm.[22] Finally, the four matrices are combined using matrix factorization[23] and UMAP reduction.[24]

Cell identification annotation

There are two approaches to cell identity annotation: Annotation based on feature annotation of ATAC peaks,[25] and annotation based on integration with reference scRNA-seq data.[26]

Applications

Differences between scGET-seq and scATAC-seq

Current

By using the ratio of Tn5 to TnH signals, quantitative values describing how quickly and in what direction chromatin remodelling is taking place can be calculated (chromatin velocity).[2] By isolating regions that are most dynamic and identifying which transcription factors bind there, chromatin velocity can be used to infer the dynamic epigenetic processes happening within a given cell and the contributions of various transcription factors to those processes.[2]

Future

Chromatin remodelling precedes changes in gene expression and enhances the understanding of trajectories and mechanisms of cellular changes.[27][28] Thus, platforms and tools for integration of multimodal data are areas of active research[29][30][31] Incorporating temporal and directionality elements through integration of chromatin velocity with RNA velocity has been proposed to reveal even more information about differentiation pathways.[32][33]

Limitations

scGET-seq has some of the same limitations as scATAC-seq. Both processes require nuclei samples from viable cells, and high cellular viability.[13] Low cellular viability leads to high background DNA contamination that do not accurately represent authentic biological signals. Additionally, the sparsity and noisy nature of scATAC-seq and scGET-seq data makes analysis challenging, and there is no consensus yet on how to best manage this data[34]

Another limitation is that scGET-seq still needs the validation of SNVs results by bulk genome sequencing. Even though there is a high correlation of mutations between bulk exome sequencing and scGET-seq results, scGET-seq fails to capture all exome SNVs.[2]

References

  1. "From reads to insight: a hitchhiker's guide to ATAC-seq data analysis". Genome Biology 21 (1): 22. February 2020. doi:10.1186/s13059-020-1929-3. PMID 32014034. 
  2. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 "Chromatin Velocity reveals epigenetic dynamics by single-cell profiling of heterochromatin and euchromatin". Nature Biotechnology 40 (2): 235–244. February 2022. doi:10.1038/s41587-021-01031-1. PMID 34635836. 
  3. "Chromatin modifications and their function" (in English). Cell 128 (4): 693–705. February 2007. doi:10.1016/j.cell.2007.02.005. PMID 17320507. 
  4. "RNA velocity of single cells". Nature 560 (7719): 494–498. August 2018. doi:10.1038/s41586-018-0414-6. PMID 30089906. Bibcode2018Natur.560..494L. 
  5. "Chromatin accessibility and the regulatory epigenome". Nature Reviews. Genetics 20 (4): 207–220. April 2019. doi:10.1038/s41576-018-0089-8. PMID 30675018. 
  6. "The accessible chromatin landscape of the human genome". Nature 489 (7414): 75–82. September 2012. doi:10.1038/nature11232. PMID 22955617. Bibcode2012Natur.489...75T. 
  7. "Heterochromatin as an Important Driver of Genome Organization". Frontiers in Cell and Developmental Biology 8: 579137. 2020. doi:10.3389/fcell.2020.579137. PMID 33072761. 
  8. "The Role of Chromatin Density in Cell Population Heterogeneity during Stem Cell Differentiation". Scientific Reports 7 (1): 13307. October 2017. doi:10.1038/s41598-017-13731-3. PMID 29042584. Bibcode2017NatSR...713307G. 
  9. "Tumour heterogeneity and resistance to cancer therapies". Nature Reviews. Clinical Oncology 15 (2): 81–94. February 2018. doi:10.1038/nrclinonc.2017.166. PMID 29115304. 
  10. "Tumour heterogeneity and metastasis at single-cell resolution". Nature Cell Biology 20 (12): 1349–1360. December 2018. doi:10.1038/s41556-018-0236-7. PMID 30482943. 
  11. "Research and application of single-cell sequencing in tumor heterogeneity and drug resistance of circulating tumor cells". Biomarker Research 8 (1): 60. November 2020. doi:10.1186/s40364-020-00240-1. PMID 33292625. 
  12. "Single-cell ATAC-seq: strength in numbers". Genome Biology 16 (1): 172. August 2015. doi:10.1186/s13059-015-0737-7. PMID 26294014. 
  13. 13.0 13.1 "Sketching open and closed chromatin". Nature Methods 18 (12): 1448. December 2021. doi:10.1038/s41592-021-01351-9. PMID 34862496. 
  14. "Isolation of Nuclei for Single Cell RNA Sequencing & Tissues for Single Cell RNA Sequencing -Demonstrated Protocol -Sample Prep -Single Cell Gene Expression -Official 10x Genomics Support". https://support.10xgenomics.com/single-cell-gene-expression/sample-prep/doc/demonstrated-protocol-isolation-of-nuclei-for-single-cell-rna-sequencing-and-tissues-for-single-cell-rna-sequencing. 
  15. "Chapter 4 - Bioinformatics of Epigenomic Data Generated From Next-Generation Sequencing" (in en). Epigenetics in Human Disease. Translational Epigenetics. 6 (Second ed.). Academic Press. January 2018. pp. 65–106. doi:10.1016/B978-0-12-812215-0.00004-2. ISBN 978-0-12-812215-0. 
  16. "Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain". Nature 410 (6824): 120–124. March 2001. doi:10.1038/35065138. PMID 11242054. Bibcode2001Natur.410..120B. 
  17. "Interactions of HP1 Bound to H3K9me3 Dinucleosome by Molecular Simulations and Biochemical Assays" (in English). Biophysical Journal 114 (10): 2336–2351. May 2018. doi:10.1016/j.bpj.2018.03.025. PMID 29685391. Bibcode2018BpJ...114.2336W. 
  18. "Structure of the HP1 chromodomain bound to histone H3 methylated at lysine 9". Nature 416 (6876): 103–107. March 2002. doi:10.1038/nature722. PMID 11882902. Bibcode2002Natur.416..103N. 
  19. "Chromium Single Cell ATAC Reagent Kits User Guide (v1.1 Chemistry) -User Guide -Official 10x Genomics Support". https://support.10xgenomics.com/permalink/7Blfuhe1ZybSuDshAX9gfz. 
  20. "Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation". Computational and Structural Biotechnology Journal 18: 1429–1439. 2020-01-01. doi:10.1016/j.csbj.2020.06.012. PMID 32637041. 
  21. "BBKNN: fast batch alignment of single cell transcriptomes". Bioinformatics 36 (3): 964–965. February 2020. doi:10.1093/bioinformatics/btz625. PMID 31400197. 
  22. "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports 9 (1): 5233. March 2019. doi:10.1038/s41598-019-41695-z. PMID 30914743. Bibcode2019NatSR...9.5233T. 
  23. "Data Fusion by Matrix Factorization". IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (1): 41–53. January 2015. doi:10.1109/TPAMI.2014.2343973. PMID 26353207. 
  24. "UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction — umap 0.5 documentation". https://umap-learn.readthedocs.io/en/latest/. 
  25. dawe/scatACC, 2022-02-21, https://github.com/dawe/scatACC, retrieved 2022-03-04 
  26. "A comparison of automatic cell identification methods for single-cell RNA sequencing data". Genome Biology 20 (1): 194. September 2019. doi:10.1186/s13059-019-1795-z. PMID 31500660. 
  27. "Transcription factors orchestrate dynamic interplay between genome topology and gene regulation during cell reprogramming". Nature Genetics 50 (2): 238–249. February 2018. doi:10.1038/s41588-017-0030-7. PMID 29335546. 
  28. "Integrative Single-Cell RNA-Seq and ATAC-Seq Analysis of Human Developmental Hematopoiesis". Cell Stem Cell 28 (3): 472–487.e7. March 2021. doi:10.1016/j.stem.2020.11.015. PMID 33352111. 
  29. "scJoint integrates atlas-scale single-cell RNA-seq and ATAC-seq data with transfer learning". Nature Biotechnology 40 (5): 703–710. January 2022. doi:10.1038/s41587-021-01161-6. PMID 35058621. 
  30. "Comprehensive Integration of Single-Cell Data" (in English). Cell 177 (7): 1888–1902.e21. June 2019. doi:10.1016/j.cell.2019.05.031. PMID 31178118. 
  31. "Integrative analyses of single-cell transcriptome and regulome using MAESTRO". Genome Biology 21 (1): 198. August 2020. doi:10.1186/s13059-020-02116-x. PMID 32767996. 
  32. "sciCAN: Single-cell chromatin accessibility and gene expression data integration via Cycle-consistent Adversarial Network" (in en). bioRxiv: 2021.11.30.470677. 2021-12-01. doi:10.1101/2021.11.30.470677. https://www.biorxiv.org/content/10.1101/2021.11.30.470677v1. 
  33. "scDVF: Single-cell Transcriptomic Deep Velocity Field Learning with Neural Ordinary Differential Equations" (in en). bioRxiv: 2022.02.15.480564. 2022-02-23. doi:10.1101/2022.02.15.480564. https://www.biorxiv.org/content/10.1101/2022.02.15.480564v2. 
  34. "Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation". Computational and Structural Biotechnology Journal 18: 1429–1439. January 2020. doi:10.1016/j.csbj.2020.06.012. PMID 32637041.