Biology:Time-resolved RNA sequencing

From HandWiki

Time-resolved RNA sequencing methods are applications of RNA-seq that allow for observations of RNA abundances over time in a biological sample or samples. Second-Generation DNA sequencing has enabled cost effective, high throughput and unbiased analysis of the transcriptome.[1] Normally, RNA-seq is only capable of capturing a snapshot of the transcriptome at the time of sample collection.[1] This necessitates multiple samplings at multiple time points, which increases both monetary and time costs for experiments. Methodological and technological innovations have allowed for the analysis of the RNA transcriptome over time without requiring multiple samplings at various time points.

Background

While DNA encodes all of the functional elements of life, the information encoded must be converted into functional form. Following the central dogma of molecular biology, messenger RNA encodes genetic information for producing proteins, which, alongside functional RNA carry out the majority of cellular processes required for life.[2] Changes in RNA abundance may be used as a measurement of changes in cellular behavior, such as heat stress, infection by virus, or oncogenesis.[3] Knowledge of how the transcriptome changes during cellular processes allows for greater understanding of the exact mechanisms underlying these processes.

Originally, transcriptome-wide RNA abundance could only be assessed using methods such as DNA microarrays or serial analysis of gene expression (SAGE).[4][5] These methods are prohibitive in differing regards; microarrays, while cheap, provide inconsistent results[6] and SAGE is based on sanger sequencing, which provides limited throughput. Using second generation sequencing, instead of measuring relative hybridization of sequences to probes in the case of microarrays or sequencing short segments in the case of SAGE, a researcher can simply sequence the bulk RNA within a sample and measure relative abundances of specific types of RNA by comparing the number of times each RNA molecule was sequenced in a given sample.

Normally, in a traditional RNA-seq, microarray, or SAGE experiment RNA is extracted from a biological sample such as cultured cells, and the RNA is analyzed using the chosen method. The data obtained from such an experiment corresponds to abundance of RNA under the given experimental conditions at the time of harvest. For many applications, such as comparing the abundance of mRNA molecules between cells exposed to a drug and those not exposed to the drug, this type of experimental approach is sufficient. However, many cellular processes of scientific and medical interest are processes which occur over time, such as cellular differentiation or phagocytosis.[7][8] Studying such processes requires analysis of RNA abundance across a series of time points.

Methods

File:TimeResolvedRNAseqFig2.tif

Time series samples

Sample preparation and data processing

The simplest approach towards assessing RNA abundance over time is to simply use multiple samples which are treated in exactly the same way, except for the duration of treatment. For example, to investigate a biological process which is estimated to occur for an hour, a researcher might design an experiment where the process is triggered for five minutes, 15 minutes, 30 minutes, 45 minutes, one hour, and two hours in separate cell culture samples before harvesting the cells for RNA-seq analysis. The researcher would then have measurements of the transcriptome at each of these time points, and comparing between these samples would indicate which cellular processes are activated and deactivated over time.

Strengths

This method is the most common for measurement of RNA over time in cell culture models, mainly due to its simplicity. Each biological sample need only be processed in exactly the same way, and the factor of time is easily adjusted in most experimental protocols. Furthermore, since each time point is its own sample, more RNA can be harvested and sequenced for a study.

Weaknesses

The requirement of multiple samples for time-resolved data collection increases the cost of the experiment as well as introducing a greater potential for technical errors. While the price of massively parallel sequencing has decreased greatly since its introduction, it is still prohibitively expensive for many laboratories to conduct large scale RNA-seq studies. This issue is compounded by additional time points increasing the number of samples by a multiple of the number of time points; using two time points rather than one doubles the number of samples required in an experiment. Consequently, many studies which use time series RNA-seq become limited in either their sample size, which reduces statistical power,[9] or the number of time points, which reduces their time resolution, or both. Finally, by requiring a greater number of biological samples, there is greater risk for human error to affect the results, which may lead to spurious conclusions[10][11]

Affinity Purification

File:TimeResolvedRNASeqFig3.tif

Sample preparation and data processing

In this approach, cell culture samples are cultured with tagged nucleotides which allow for selective purification of newly synthesized RNA molecules. One popular approach is pulse labeling with 4-thiouridine (4-sU), a uracil analogue that is incorporated in newly synthesized RNA molecules.[12] In this type of experiment, a researcher would supplement cells with 4-sU at the time of the experiment or shortly beforehand. When the experimental treatment presumably affects RNA expression, newly synthesized RNA would be labeled with 4-sU. Newly synthesized RNA is labeled with a reactive thiol group, making it possible to link useful molecules to the RNA.[13] Biotin is a popular molecule for use in this type of assay, as it is inexpensive and binds incredibly strongly and selectively to streptavidin. Incubation of biotinylated RNA with beads containing streptavidin allows for the selective purification of newly synthesized RNA. From here, newly synthesized and total RNA are sequenced separately and compared for differences.

Strengths

Affinity purification makes use of the incredibly popular biotin-streptavidin system for fractionation of biological materials. Binding of biotin to streptavidin is incredibly strong (Kd < 10−14 mol/L).[14] It is also highly specific, which results in minimal background signal from non-specific binding events. Furthermore, time resolution is obtained in a single biological sample, resulting in reduced biological variability compared to using separate samples for each time point.

Weaknesses

The weaknesses of this method are mainly centered around efficiency. One major difficulty is uptake of 4-sU into cultured cells. If 4-sU is given too early, then it will be incorporated into RNA that was not synthesized before the cell began responding to the experimental conditions. If it is given too late, then early stages of the cellular response are not captured by the experiment. The rate of uptake of 4-sU can be measured, but this requires additional experiments to determine optimal dosage and time. Furthermore, these parameters need to be measured in the specific cell lines of interest, as different cell lines may take up 4-sU more slowly than others. RNA is known to be prone to degradation in vitro. It is common for experimental protocols involving RNA to include a number of steps to reduce chances of Ribonuclease contamination or spontaneous degradation of samples, as RNA quality affects RNA-seq results.[15] Metabolic labeling involves a number of additional steps that must be performed in the laboratory on RNA that is in solution. Since metabolic labeling requires that the RNA be kept unfrozen in liquid solution, some level of spontaneous degradation is unavoidable, although it is usually not to such an extent that results are affected. Of greater risk is the chances of ribonuclease contamination, which would render a sample useless, wasting time and resources. It is important for researchers working with RNA in any capacity to minimize unnecessary handling of RNA due to these risks. One additional drawback of using this method is, given equivalent sample size, more sequencing runs are required compared to a time-series experiment. This is because multiple RNA samples corresponding to the initial time point must be sequenced.

Research suggests that 4-sU labeling may result in transcriptional changes on its own, which would affect any results obtained using this method.[16]

Nucleotide conversion

Sample preparation and data processing

Nucleotide conversion works by converting some nucleotides in newly synthesized RNA into others, which can be detected through sequencing. SLAMseq and Timelapse-seq are examples of such approaches.[17][18] As in affinity purification, cells are incubated with 4-sU. After extraction of RNA from samples, they are treated with iodoacetamide (SLAMseq) or 2,2,2-trifluoroethylamine and sodium periodate (Timelapse-seq), which converts 4-sU into a cytosine analogue that is sequenced as a cytosine nucleotide instead of uracil. During sequence alignment and data processing, the U-to-C conversions are used to quantify the number of transcripts that are newly synthesized compared to bulk RNA. [19]

Strengths

This method shares many strengths with affinity purification; notably the fact that multiple samples are not required for a time-series. This method eliminates the need for multiple sequencing runs for multiple time points, as all RNA is run together on the sequencing instrument and labeled RNA is separated from nonlabeled in silico. This reduces sequencing costs significantly, as now time resolution may be obtained without the need for additional samples or additional sequencing runs. Furthermore, by sequencing multiple time points together, technical variability introduced by sample processing is further reduced in addition to the reduced biological variability provided through the 4-sU experimental strategy.

Weaknesses

As with strengths, this method shares many weaknesses with affinity purification methods. Notably, 4-sU uptake and increased sample handling. Since Timelapse-seq relies upon synthetic chemistry methods to convert nucleotides, incomplete reactions result in an underestimation of the abundance of newly synthesized RNA and may result in variability between samples.

Nascent transcript sequencing

Sample preparation and data processing

Unlike metabolic labeling, nascent transcript sequencing (NET-seq) directly sequences transcripts that are still undergoing transcription by RNA polymerase II.[20] This method allows for the study of the dynamics of transcription elongation, which is not possible with metabolic labeling techniques. For a NET-seq experiment, cells are treated as with a standard RNA-seq experiment until they are lysed. Lysis is performed such that RNA-protein complexes remain intact, and RNA polymerase II is immunoprecipitated from the lysate. RNA that was undergoing transcription from DNA is still attached to RNA polymerase and is subsequently eluted from the polymerase and sequenced.

Strengths

Since NET-seq extracts transcripts that have not completed transcription, it is possible to obtain single-nucleotide resolution on the most recently synthesized nucleotide of transcripts. This is valuable in the study of phenomena such as transcriptional kinetics. Furthermore, it allows for the study of unstable transcripts which are degraded shortly after transcription. The general approach of immunoprecipitating RNA-binding proteins has great utility in understanding other areas of RNA biology, such as splicing.

Weaknesses

This method relies upon immunoprecipitation of RNA polymerase II. There are a number of issues with immunoprecipitation, including non-specific binding interactions which may result in the immunoprecipitation of off-target RNA molecules. The temporal resolution of NET-seq is limited to transcription elongation. While comparing relative abundances between transcripts using NET-seq is possible, it is not the intention of the method.

Future directions

Aside from time-series sampling, there are currently no methods for comparing more than two time points. Metabolic labeling experiments are only capable of comparing RNA abundances before and after pulse-labeling. It is of interest to be able to observe modifications to the transcriptome over a series of time points in a single sample, as this would provide increased time resolution in studies. Existing methods of metabolic labeling are of interest for this; if multiple different metabolic labels were used at differing time points this may allow for intermediate time points to be investigated. However, such approaches must be developed with care, as biases in labeling methods and sample processing steps could contribute to misleading results if data from different methods are compared to one another.

Metabolic labeling with 4-sU has been reported to affect cellular phenotype.[16] In current practice, this is unavoidable and is tolerated as the obtained data still fit current biological models, as well as the fact that 4-sU samples are compared with 4-sU samples in most cases. However, this has the potential to result in spurious conclusions, especially if there is any interaction between the effect of 4-sU and the chosen experimental condition. It is not possible to distinguish differences in RNA levels as being due to the experimental conditions being studied or being the result of 4-sU treatment. Identification of labeling chemicals that do not affect cellular phenotype would eliminate these issues altogether.

References

  1. 1.0 1.1 "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews. Genetics 10 (1): 57–63. January 2009. doi:10.1038/nrg2484. PMID 19015660. 
  2. "On Protein Synthesis". Symposia of the Society for Experimental Biology, Number XII: The Biological Replication of Macromolecules. Cambridge University Press. 1958. pp. 138–163. 
  3. "RNA sequencing: advances, challenges and opportunities". Nature Reviews. Genetics 12 (2): 87–98. February 2011. doi:10.1038/nrg2934. PMID 21191423. 
  4. "Quantitative monitoring of gene expression patterns with a complementary DNA microarray". Science 270 (5235): 467–470. October 1995. doi:10.1126/science.270.5235.467. PMID 7569999. Bibcode1995Sci...270..467S. 
  5. "Serial analysis of gene expression". Science 270 (5235): 484–487. October 1995. doi:10.1126/science.270.5235.484. PMID 7570003. Bibcode1995Sci...270..484V. 
  6. "Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations". BMC Bioinformatics 7: 276. June 2006. doi:10.1186/1471-2105-7-276. PMID 16749918. 
  7. "Dynamics in Transcriptomics: Advancements in RNA-seq Time Course and Downstream Analysis". Computational and Structural Biotechnology Journal 13: 469–477. 2015. doi:10.1016/j.csbj.2015.08.004. PMID 26430493. 
  8. "Time-Course Gene Set Analysis for Longitudinal Gene Expression Data". PLOS Computational Biology 11 (6): e1004310. June 2015. doi:10.1371/journal.pcbi.1004310. PMID 26111374. Bibcode2015PLSCB..11E4310H. 
  9. "How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?". RNA 22 (6): 839–851. June 2016. doi:10.1261/rna.053959.115. PMID 27022035. 
  10. "RNA-seq: technical variability and sampling". BMC Genomics 12: 293. June 2011. doi:10.1186/1471-2164-12-293. PMID 21645359. 
  11. "RNA-seq differential expression studies: more sequence or more replication?". Bioinformatics 30 (3): 301–304. February 2014. doi:10.1093/bioinformatics/btt688. PMID 24319002. 
  12. "High-resolution gene expression profiling for simultaneous kinetic parameter analysis of RNA synthesis and decay". RNA 14 (9): 1959–1972. September 2008. doi:10.1261/rna.1136108. PMID 18658122. 
  13. "Global quantification of mammalian gene expression control". Nature 473 (7347): 337–342. May 2011. doi:10.1038/nature10098. PMID 21593866. Bibcode2011Natur.473..337S. http://edoc.mdc-berlin.de/11664/1/11664oa.pdf. 
  14. "Avidin". Advances in Protein Chemistry 29: 85–133. 1975. doi:10.1016/S0065-3233(08)60411-8. ISBN 9780120342297. PMID 237414. 
  15. "RNA-seq: impact of RNA degradation on transcript quantification". BMC Biology 12: 42. May 2014. doi:10.1186/1741-7007-12-42. PMID 24885439. 
  16. 16.0 16.1 "4-thiouridine inhibits rRNA synthesis and causes a nucleolar stress response". RNA Biology 10 (10): 1623–1630. October 2013. doi:10.4161/rna.26214. PMID 24025460. 
  17. "Thiol-linked alkylation of RNA to assess expression dynamics". Nature Methods 14 (12): 1198–1204. December 2017. doi:10.1038/nmeth.4435. PMID 28945705. 
  18. "TimeLapse-seq: adding a temporal dimension to RNA sequencing through nucleoside recoding". Nature Methods 15 (3): 221–225. March 2018. doi:10.1038/nmeth.4582. PMID 29355846. 
  19. "Quantification of experimentally induced nucleotide conversions in high-throughput sequencing datasets". BMC Bioinformatics 20 (1): 258. May 2019. doi:10.1186/s12859-019-2849-7. PMID 31109287. 
  20. "Nascent transcript sequencing visualizes transcription at nucleotide resolution". Nature 469 (7330): 368–373. January 2011. doi:10.1038/nature09652. PMID 21248844. Bibcode2011Natur.469..368C.