Biology:ChIP-sequencing
ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously, ChIP-on-chip was the most common technique utilized to study these protein–DNA relations.
Uses
ChIP-seq is used primarily to determine how transcription factors and other chromatin-associated proteins influence phenotype-affecting mechanisms. Determining how proteins interact with DNA to regulate gene expression is essential for fully understanding many biological processes and disease states. This epigenetic information is complementary to genotype and expression analysis. ChIP-seq technology is currently seen primarily as an alternative to ChIP-chip which requires a hybridization array. This necessarily introduces some bias, as an array is restricted to a fixed number of probes. Sequencing, by contrast, is thought to have less bias, although the sequencing bias of different sequencing technologies is not yet fully understood.
Specific DNA sites in direct physical interaction with transcription factors and other proteins can be isolated by chromatin immunoprecipitation. ChIP produces a library of target DNA sites bound to a protein of interest in vivo. Massively parallel sequence analyses are used in conjunction with whole-genome sequence databases to analyze the interaction pattern of any protein with DNA,[1] or the pattern of any epigenetic chromatin modifications. This can be applied to the set of ChIP-able proteins and modifications, such as transcription factors, polymerases and transcriptional machinery, structural proteins, protein modifications, and DNA modifications.[2] As an alternative to the dependence on specific antibodies, different methods have been developed to find the superset of all nucleosome-depleted or nucleosome-disrupted active regulatory regions in the genome, like DNase-Seq and FAIRE-Seq.
Workflow of ChIP-sequencing
ChIP
ChIP is a powerful method to selectively enrich for DNA sequences bound by a particular protein in living cells. However, the widespread use of this method has been limited by the lack of a sufficiently robust method to identify all of the enriched DNA sequences. The ChIP process enriches specific crosslinked DNA-protein complexes using an antibody against the protein of interest. For a good description of the ChIP wet lab protocol see ChIP-on-chip. Oligonucleotide adaptors are then added to the small stretches of DNA that were bound to the protein of interest to enable massively parallel sequencing.
Sequencing
After size selection, all the resulting ChIP-DNA fragments are sequenced simultaneously using a genome sequencer. A single sequencing run can scan for genome-wide associations with high resolution, meaning that features can be located precisely on the chromosomes. ChIP-chip, by contrast, requires large sets of tiling arrays for lower resolution.
There are many new sequencing methods used in this sequencing step. Some technologies that analyze the sequences can use cluster amplification of adapter-ligated ChIP DNA fragments on a solid flow cell substrate to create clusters of approximately 1000 clonal copies each. The resulting high density array of template clusters on the flow cell surface is sequenced by a Genome analyzing program. Each template cluster undergoes sequencing-by-synthesis in parallel using novel fluorescently labelled reversible terminator nucleotides. Templates are sequenced base-by-base during each read. Then, the data collection and analysis software aligns sample sequences to a known genomic sequence to identify the ChIP-DNA fragments.[citation needed]
Sensitivity
Sensitivity of this technology depends on the depth of the sequencing run (i.e. the number of mapped sequence tags), the size of the genome and the distribution of the target factor. The sequencing depth is directly correlated with cost. If abundant binders in large genomes have to be mapped with high sensitivity, costs are high as an enormously high number of sequence tags will be required. This is in contrast to ChIP-chip in which the costs are not correlated with sensitivity.
Unlike microarray-based ChIP methods, the precision of the ChIP-seq assay is not limited by the spacing of predetermined probes. By integrating a large number of short reads, highly precise binding site localization is obtained. Compared to ChIP-chip, ChIP-seq data can be used to locate the binding site within few tens of base pairs of the actual protein binding site. Tag densities at the binding sites are a good indicator of protein–DNA binding affinity,[3] which makes it easier to quantify and compare binding affinities of a protein to different DNA sites.[4]
Current research
STAT1 DNA association: ChIP-seq was used to study STAT1 targets in HeLA S3 cells. The performance of ChIP-seq was then compared to the alternative protein–DNA interaction methods of ChIP-PCR and ChIP-chip.[5]
Nucleosome Architecture of Promoters: Using ChIP-seq, it was determined that Yeast genes seem to have a minimal nucleosome-free promoter region of 150bp in which RNA polymerase can initiate transcription.[6]
Transcription factor conservation: ChIP-seq was used to compare conservation of TFs in the forebrain and heart tissue in embryonic mice. The authors identified and validated the heart functionality of transcription enhancers, and determined that transcription enhancers for the heart are less conserved than those for the forebrain during the same developmental stage.[7]
Genome-wide ChIP-seq: ChIP-sequencing was completed on the worm C. elegans to explore genome-wide binding sites of 22 transcription factors. Up to 20% of the annotated candidate genes were assigned to transcription factors. Several transcription factors were assigned to non-coding RNA regions and may be subject to developmental or environmental variables. The functions of some of the transcription factors were also identified. Some of the transcription factors regulate genes that control other transcription factors. These genes are not regulated by other factors. Most transcription factors serve as both targets and regulators of other factors, demonstrating a network of regulation.[8]
Inferring regulatory network: ChIP-seq signal of Histone modification were shown to be more correlated with transcription factor motifs at promoters in comparison to RNA level.[9] Hence author proposed that using histone modification ChIP-seq would provide more reliable inference of gene-regulatory networks in comparison to other methods based on expression.
ChIP-seq offers an alternative to ChIP-chip. STAT1 experimental ChIP-seq data have a high degree of similarity to results obtained by ChIP-chip for the same type of experiment, with >64% of peaks in shared genomic regions. Because the data are sequence reads, ChIP-seq offers a rapid analysis pipeline (as long as a high-quality genome sequence is available for read mapping, and the genome doesn't have repetitive content that confuses the mapping process) as well as the potential to detect mutations in binding-site sequences, which may directly support any observed changes in protein binding and gene regulation.
Computational analysis
As with many high-throughput sequencing approaches, ChIP-seq generates extremely large data sets, for which appropriate computational analysis methods are required. To predict DNA-binding sites from ChIP-seq read count data, peak calling methods have been developed. The most popular method[citation needed] is MACS which empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites.[10]
Another relevant computational problem is Differential peak calling, which identifies significant differences in two ChIP-seq signals from distinct biological conditions. Differential peak callers segment two ChIP-seq signals and identify differential peaks using Hidden Markov Models. Examples for two-stage differential peak callers are ChIPDiff[11] and ODIN.[12]
See also
- ChIP-on-chip
- ChIP-PET
- ChIP-PCR
Similar methods
- CUT&RUN sequencing, antibody-targeted controlled cleavage by micrococcal nuclease instead of ChIP, allowing for enhanced signal-to-noise ratio during sequencing.
- CUT&Tag sequencing, antibody-targeted controlled cleavage by transposase Tn5 instead of ChIP, allowing for enhanced signal-to-noise ratio during sequencing.
- Sono-Seq, identical to ChIP-Seq but skipping the immunoprecipitation step.
- HITS-CLIP[13][14] (also called CLIP-Seq), for finding interactions with RNA rather than DNA.
- PAR-CLIP, another method for identifying the binding sites of cellular RNA-binding proteins (RBPs).
- RIP-Chip, same goal and first steps, but does not use cross linking methods and uses microarray instead of sequencing
- SELEX, a method for finding a consensus binding sequence
- Competition-ChIP, to measure relative replacement dynamics on DNA.
- ChiRP-Seq to measure RNA-bound DNA and proteins.
- ChIP-exo uses exonuclease treatment to achieve up to single base-pair resolution
- ChIP-nexus improved version of ChIP-exo to achieve up to single base-pair resolution.
- DRIP-seq uses S9.6 antibody to precipitate three-stranded DND:RNA hybrids called R-loops.
- TCP-seq, principally similar method to measure mRNA translation dynamics.
- Calling Cards, uses a transposase to mark the sequence where a transcription factor binds.[15]
References
- ↑ Johnson, DSExpression error: Unrecognized word "et". (2007). "Genome-wide mapping of in vivo protein–DNA interactions". Science 316 (5830): 1497–1502. doi:10.1126/science.1141319. PMID 17540862. Bibcode: 2007Sci...316.1497J. https://authors.library.caltech.edu/51935/7/Johnson-SOM.revision1.pdf.
- ↑ http://www.illumina.com/Documents/products/datasheets/datasheet_chip_sequence.pdf
- ↑ Jothi et al. (2008) Genome-wide identification of in vivo protein–DNA binding sites from ChIP-seq data. Nucleic Acids Res 36(16) 5221–5231.
- ↑ Bernstein, BE (2005). "Genomic maps and comparative analysis of histone modifications in human and mouse". Cell 120 (2): 169–181. doi:10.1016/j.cell.2005.01.001. PMID 15680324.
- ↑ Robertson, G (2007). "Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing". Nature Methods 4 (8): 651–657. doi:10.1038/nmeth1068. PMID 17558387.
- ↑ Schmid (2007). "ChIP-Seq Data reveal nucleosome architecture of human promoters". Cell 131 (5): 831–832. doi:10.1016/j.cell.2007.11.017. PMID 18045524.
- ↑ Blow, M. J.; McCulley, D. J.; Li, Z.; Zhang, T.; Akiyama, J. A.; Holt, A.; Plajzer-Frick, I.; Shoukry, M. et al. (2010). "ChIP-seq identification of weakly conserved heart enhancers". Nature Genetics 42 (9): 806–810. doi:10.1038/ng.650. PMID 20729851.
- ↑ Niu, W.; Lu, Z. J.; Zhong, M.; Sarov, M.; Murray, J. I.; Brdlik, C. M.; Janette, J.; Chen, C. et al. (2011). "Diverse transcription factor binding features revealed by genome-wide ChIP-seq in C. elegans". Genome Research 21 (2): 245–254. doi:10.1101/gr.114587.110. PMID 21177963.
- ↑ Vibhor Kumar, Masafumi Muratani, Nirmala Arul Rayan, Petra Kraus, Thomas Lufkin, Huck Hui Ng and Shyam Prabhakar, Uniform, optimal signal processing of mapped deep-sequencing data, Nature biotechnology, 2013
- ↑ Zhang, Y; Liu, T; Meyer, CA; Eeckhoute, J; Johnson, DS; Bernstein, BE; Nusbaum, C; Myers, RM et al. (2008). "Model-based analysis of ChIP-Seq (MACS)". Genome Biol 9 (9): R137. doi:10.1186/gb-2008-9-9-r137. PMID 18798982.
- ↑ Xu, Sung; Wei; Lin (28 July 2008). "An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data". Bioinformatics 24 (20): 2344–2349. doi:10.1093/bioinformatics/btn402. PMID 18667444.
- ↑ Allhoff, Costa; Sere; Chauvistre; Lin; Zenke (24 October 2014). "Detecting differential peaks in ChIP-seq signals with ODIN". Bioinformatics 30 (24): 3467–3475. doi:10.1093/bioinformatics/btu722. PMID 25371479.
- ↑ "HITS-CLIP yields genome-wide insights into brain alternative RNA processing". Nature 456 (7221): 464–9. November 2008. doi:10.1038/nature07488. PMID 18978773. Bibcode: 2008Natur.456..464L.
- ↑ Darnell RB (2010) HITS-CLIP: panoramic views of protein-RNA regulation in living cells. Wiley Interdiscip Rev RNA. 1):266-86. doi:10.1002/wrna.31
- ↑ Wang, H.; Mayhew, D.; Chen, X.; Johnston, M.; Mitra, R. D. (6 April 2011). "Calling Cards enable multiplexed identification of the genomic targets of DNA-binding proteins". Genome Research 21 (5): 748–755. doi:10.1101/gr.114850.110. PMID 21471402.
External links
- ReMap catalogue: An integrative and uniform ChIP-Seq analysis of regulatory elements from +2800 ChIP-seq datasets, giving a catalogue of 80 million peaks from 485 transcription regulators. The analysis is described in detail in this paper.[1]
- ChIPBase database: a database for exploring transcription factor binding maps from ChIP-Seq data. It provides the most comprehensive ChIP-Seq data set for various cell/tissue types and conditions.
- GeneProf database and analysis tool: GeneProf is a freely accessible, easy-to-use analysis environment for ChIP-seq and RNA-seq data and comes with a large database of ready-analysed public experiments, e.g. for transcription factor binding and histone modifications. The database is described in detail in this paper.
- Differential Peak Calling: Tutorial for differential peak calling with ODIN.
- Bioinformatic analysis of ChIP-seq data: Practical guidelines for the comprehensive analysis of ChIP-seq data are described in ** *this paper.[2]
- KLTepigenome: Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform.
- SignalSpider: a tool for probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles
- FullSignalRanker: a tool for regression and peak prediction on multiple normalized ChIP-Seq signal profiles
- ↑ Chèneby, Jeanne; Gheorghe, Marius; Artufel, Marie; Mathelier, Anthony; Ballester, Benoit (2018). "ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments". Nucleic Acids Research 46 (Database issue): D267–D275. doi:10.1093/nar/gkx1092. PMID 29126285.
- ↑ Bailey, TExpression error: Unrecognized word "et". (2013). "Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data". PLoS Comput Biol 9 (11): e1003326. doi:10.1371/journal.pcbi.1003326. PMID 24244136. Bibcode: 2013PLSCB...9E3326B.