Biology:List of RNA-Seq bioinformatics tools

From HandWiki
Short description: none

RNA-Seq[1][2][3] is a technique[4] that allows transcriptome studies (see also Transcriptomics technologies) based on next-generation sequencing technologies. This technique is largely dependent on bioinformatics tools developed to support the different steps of the process. Here are listed some of the principal tools commonly employed and links to some important web resources.

Design

Design is a fundamental step of a particular RNA-Seq experiment. Some important questions like sequencing depth/coverage or how many biological or technical replicates must be carefully considered. Design review.[5]

  • PROPER: PROspective Power Evaluation for RNAseq.
  • RNAtor: an Android Application to calculate optimal parameters for popular tools and kits available for DNA sequencing projects.
  • Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression.
  • ssizeRNA Sample Size Calculation for RNA-Seq Experimental Design.

Quality control, trimming, error correction and pre-processing of data

Quality assessment of raw data[6] is the first step of the bioinformatics pipeline of RNA-Seq. Often, is necessary to filter data, removing low quality sequences or bases (trimming), adapters, contaminations, overrepresented sequences or correcting errors to assure a coherent final result.

Quality control

  • AfterQC - Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.
  • bam-lorenz-coverage A tool that can generate Lorenz plots and Coverage plots, or export these statistics to text files, directly from BAM file(s).[7]
  • dupRadar[8] An R package which provides functions for plotting and analyzing the duplication rates dependent on the expression levels.
  • FastQC is a quality control tool for high-throughput sequence data (Babraham Institute) and is developed in Java. Import of data is possible from FastQ files, BAM or SAM format. This tool provides an overview to inform about problematic areas, summary graphs and tables to rapid assessment of data. Results are presented in HTML permanent reports. FastQC can be run as a stand-alone application or it can be integrated into a larger pipeline solution.
  • fastqp Simple FASTQ quality assessment using Python.
  • Kraken:[9] A set of tools for quality control and analysis of high-throughput sequence data.
  • HTSeq[10] The Python script htseq-qa takes a file with sequencing reads (either raw or aligned reads) and produces a PDF file with useful plots to assess the technical quality of a run.
  • mRIN[11] - Assessing mRNA integrity directly from RNA-Seq data.
  • MultiQC[12] - Aggregate and visualise results from numerous tools (FastQC, HTSeq, RSeQC, Tophat, STAR, others..) across all samples into a single report.
  • NGSQC: cross-platform quality analysis pipeline for deep sequencing data.
  • NGS QC Toolkit A toolkit for the quality control (QC) of next generation sequencing (NGS) data. The toolkit comprises user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. It also includes few other tools, which are helpful in NGS data quality control and analysis.
  • PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence.
  • QC-Chain is a package of quality control tools for next generation sequencing (NGS) data, consisting of both raw reads quality evaluation and de novo contamination screening, which could identify all possible contamination sequences.
  • QC3 a quality control tool designed for DNA sequencing data for raw data, alignment, and variant calling.
  • qrqc Quickly scans reads and gathers statistics on base and quality frequencies, read length, and frequent sequences. Produces graphical output of statistics for use in quality control pipelines, and an optional HTML quality report. S4 SequenceSummary objects allow specific tests and functionality to be written around the data collected.
  • RNA-SeQC[13] is a tool with application in experiment design, process optimization and quality control before computational analysis. Essentially, provides three types of quality control: read counts (such as duplicate reads, mapped reads and mapped unique reads, rRNA reads, transcript-annotated reads, strand specificity), coverage (like mean coverage, mean coefficient of variation, 5’/3’ coverage, gaps in coverage, GC bias) and expression correlation (the tool provides RPKM-based estimation of expression levels). RNA-SeQC is implemented in Java and is not required installation, however can be run using the GenePattern web interface. The input could be one or more BAM files. HTML reports are generated as output.
  • RSeQC[14] analyzes diverse aspects of RNA-Seq experiments: sequence quality, sequencing depth, strand specificity, GC bias, read distribution over the genome structure and coverage uniformity. The input can be SAM, BAM, FASTA, BED files or Chromosome size file (two-column, plain text file). Visualization can be performed by genome browsers like UCSC, IGB and IGV. However, R scripts can also be used for visualization.
  • SAMStat[15] identifies problems and reports several statistics at different phases of the process. This tool evaluates unmapped, poorly and accurately mapped sequences independently to infer possible causes of poor mapping.
  • SolexaQA calculates sequence quality statistics and creates visual representations of data quality for second-generation sequencing data. Originally developed for the Illumina system (historically known as “Solexa”), SolexaQA now also supports Ion Torrent and 454 data.
  • Trim galore is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).

Improving the quality

Improvement of the RNA-Seq quality, correcting the bias is a complex subject.[16][17] Each RNA-Seq protocol introduces specific type of bias, each step of the process (such as the sequencing technology used) is susceptible to generate some sort of noise or type of error. Furthermore, even the species under investigation and the biological context of the samples are able to influence the results and introduce some kind of bias. Many sources of bias were already reported – GC content and PCR enrichment,[18][19] rRNA depletion,[20] errors produced during sequencing,[21] priming of reverse transcription caused by random hexamers.[22]

Different tools were developed to attempt to solve each of the detected errors.

Trimming and adapters removal

  • AlienTrimmer[23] implements a very fast approach (based on k-mers) to trim low-quality base pairs and clip technical (alien) oligonucleotides from single- or paired-end sequencing reads in plain or gzip-compressed FASTQ files (for more details, see AlienTrimmer).
  • BBDuk multithreaded tool to trim adapters and filter or mask contaminants based on kmer-matching, allowing a hamming- or edit-distance, as well as degenerate bases. Also performs optimal quality-trimming and filtering, format conversion, contaminant concentration reporting, gc-filtering, length-filtering, entropy-filtering, chastity-filtering, and generates text histograms for most operations. Interconverts between fastq, fasta, sam, scarf, interleaved and 2-file paired, gzipped, bzipped, ASCII-33 and ASCII-64. Keeps pairs together. Open-source, written in pure Java; supports all platforms with no recompilation and no other dependencies.
  • clean_reads cleans NGS (Sanger, 454, Illumina and solid) reads. It can trim bad quality regions, adaptors, vectors, and regular expressions. It also filters out the reads that do not meet a minimum quality criteria based on the sequence length and the mean quality.
  • condetri[24] is a method for content dependent read trimming for Illumina data using quality scores of each base individually. It is independent from sequencing coverage and user interaction. The main focus of the implementation is on usability and to incorporate read trimming in next-generation sequencing data processing and analysis pipelines. It can process single-end and paired-end sequencing data of arbitrary length.
  • cutadapt[25] removes adapter sequences from next-generation sequencing data (Illumina, SOLiD and 454). It is used especially when the read length of the sequencing machine is longer than the sequenced molecule, like the microRNA case.
  • Deconseq Detect and remove contaminations from sequence data.
  • Erne-Filter[26] is a short string alignment package whose goal is to provide an all-inclusive set of tools to handle short (NGS-like) reads. ERNE comprises ERNE-FILTER (read trimming and continamination filtering), ERNE-MAP (core alignment tool/algorithm), ERNE-BS5 (bisulfite treated reads aligner), and ERNE-PMAP/ERNE-PBS5 (distributed versions of the aligners).
  • FastqMcf Fastq-mcf attempts to: Detect & remove sequencing adapters and primers; Detect limited skewing at the ends of reads and clip; Detect poor quality at the ends of reads and clip; Detect Ns, and remove from ends; Remove reads with CASAVA 'Y' flag (purity filtering); Discard sequences that are too short after all of the above; Keep multiple mate-reads in sync while doing all of the above.
  • FASTX Toolkit is a set of command line tools to manipulate reads in files FASTA or FASTQ format. These commands make possible preprocess the files before mapping with tools like Bowtie. Some of the tasks allowed are: conversion from FASTQ to FASTA format, information about statistics of quality, removing sequencing adapters, filtering and cutting sequences based on quality or conversion DNA/RNA.
  • Flexbar performs removal of adapter sequences, trimming and filtering features.
  • FreClu improves overall alignment accuracy performing sequencing-error correction by trimming short reads, based on a clustering methodology.
  • htSeqTools is a Bioconductor package able to perform quality control, processing of data and visualization. htSeqTools makes possible visualize sample correlations, to remove over-amplification artifacts, to assess enrichment efficiency, to correct strand bias and visualize hits.
  • NxTrim Adapter trimming and virtual library creation routine for Illumina Nextera Mate Pair libraries.
  • PRINSEQ[27] generates statistics of your sequence data for sequence length, GC content, quality scores, n-plicates, complexity, tag sequences, poly-A/T tails, odds ratios. Filter the data, reformat and trim sequences.
  • Sabre A barcode demultiplexing and trimming tool for FastQ files.
  • Scythe A 3'-end adapter contaminant trimmer.
  • SEECER is a sequencing error correction algorithm for RNA-seq data sets. It takes the raw read sequences produced by a next generation sequencing platform like machines from Illumina or Roche. SEECER removes mismatch and indel errors from the raw reads and significantly improves downstream analysis of the data. Especially if the RNA-Seq data is used to produce a de novo transcriptome assembly, running SEECER can have tremendous impact on the quality of the assembly.
  • Sickle A windowed adaptive trimming tool for FASTQ files using quality.
  • SnoWhite[28] is a pipeline designed to flexibly and aggressively clean sequence reads (gDNA or cDNA) prior to assembly. It takes in and returns fastq or fasta formatted sequence files.
  • ShortRead is a package provided in the R (programming language) / BioConductor environments and allows input, manipulation, quality assessment and output of next-generation sequencing data. This tool makes possible manipulation of data, such as filter solutions to remove reads based on predefined criteria. ShortRead could be complemented with several Bioconductor packages to further analysis and visualization solutions (BioStrings, BSgenome, IRanges, and so on).
  • SortMeRNA is a program tool for filtering, mapping and OTU-picking NGS reads in metatranscriptomic and metagenomic data. The core algorithm is based on approximate seeds and allows for analyses of nucleotide sequences. The main application of SortMeRNA is filtering ribosomal RNA from metatranscriptomic data.
  • TagCleaner The TagCleaner tool can be used to automatically detect and efficiently remove tag sequences (e.g. WTA tags) from genomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.
  • Trimmomatic[29] performs trimming for Illumina platforms and works with FASTQ reads (single or pair-ended). Some of the tasks executed are: cut adapters, cut bases in optional positions based on quality thresholds, cut reads to a specific length, converts quality scores to Phred-33/64.
  • fastp A tool designed to provide all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported.
  • FASTX-Toolkit The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.

Detection of chimeric reads

Recent sequencing technologies normally require DNA samples to be amplified via polymerase chain reaction (PCR). Amplification often generates chimeric elements (specially from ribosomal origin) - sequences formed from two or more original sequences joined.

  • UCHIME is an algorithm for detecting chimeric sequences.
  • ChimeraSlayer is a chimeric sequence detection utility, compatible with near-full length Sanger sequences and shorter 454-FLX sequences (~500 bp).

Error correction

High-throughput sequencing errors characterization and their eventual correction.[30]

  • Acacia Error-corrector for pyrosequenced amplicon reads.
  • AllPathsLG error correction.
  • AmpliconNoise[31] AmpliconNoise is a collection of programs for the removal of noise from 454 sequenced PCR amplicons. It involves two steps the removal of noise from the sequencing itself and the removal of PCR point errors. This project also includes the Perseus algorithm for chimera removal.
  • BayesHammer. Bayesian clustering for error correction. This algorithm is based on Hamming graphs and Bayesian subclustering. While BAYES HAMMER was designed for single-cell sequencing, it also improves on existing error correction tools for bulk sequencing data.
  • Bless[32] A bloom filter-based error correction solution for high-throughput sequencing reads.
  • Blue[33] Blue is a short-read error-correction tool based on k-mer consensus and context.
  • BFC A sequencing error corrector designed for Illumina short reads. It uses a non-greedy algorithm with a speed comparable to implementations based on greedy methods.
  • Denoiser Denoiser is designed to address issues of noise in pyrosequencing data. Denoiser is a heuristic variant of PyroNoise. Developers of denoiser report a good agreement with PyroNoise on several test datasets.
  • Echo A reference-free short-read error correction algorithm.
  • Lighter. A sequencing error correction without counting.
  • LSC LSC uses short Illumina reads to corrected errors in long reads.
  • Karect Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data.
  • NoDe NoDe: an error-correction algorithm for pyrosequencing amplicon reads.
  • PyroTagger PyroTagger: A fast, accurate pipeline for analysis of rRNA amplicon pyrosequence data.
  • Quake is a tool to correct substitution sequencing errors in experiments with deep coverage for Illumina sequencing reads.
  • QuorUM: An Error Corrector for Illumina Reads.
  • Rcorrector. Error correction for Illumina RNA-seq reads.
  • Reptile is a software developed in C++ for correcting sequencing errors in short reads from next-gen sequencing platforms.
  • Seecer SEquencing Error CorrEction for Rna reads.
  • SGA
  • SOAPdenovo
  • UNOISE

Bias correction

  • Alpine[34] Modeling and correcting fragment sequence bias for RNA-seq.
  • cqn[35] is a normalization tool for RNA-Seq data, implementing the conditional quantile normalization method.
  • EDASeq[36] is a Bioconductor package to perform GC-Content Normalization for RNA-Seq Data.
  • GeneScissors A comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment.
  • Peer[37] is a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor analysis methods. Applications of PEER have: a) detected batch effects and experimental confounders, b) increased the number of expression QTL findings by threefold, c) allowed inference of intermediate cellular traits, such as transcription factor or pathway activations.
  • RUV[38] is a R package that implements the remove unwanted variation (RUV) methods of Risso et al. (2014) for the normalization of RNA-Seq read counts between samples.
  • svaSurrogate Variable Analysis.
  • svaseq removing batch effects and other unwanted noise from sequencing data.
  • SysCall[39] is a classifier tool to identification and correction of systematic error in high-throughput sequence data.

Other tasks/pre-processing data

Further tasks performed before alignment, namely paired-read mergers.

  • AuPairWise A Method to Estimate RNA-Seq Replicability through Co-expression.
  • BamHash is a checksum based method to ensure that the read pairs in FASTQ files match exactly the read pairs stored in BAM files, regardless of the ordering of reads. BamHash can be used to verify the integrity of the files stored and discover any discrepancies. Thus, BamHash can be used to determine if it is safe to delete the FASTQ files storing raw sequencing reads after alignment, without the loss of data.
  • BBMerge Merges paired reads based on overlap to create longer reads, and an insert-size histogram. Fast, multithreaded, and yields extremely few false positives. Open-source, written in pure Java; supports all platforms with no recompilation and no other dependencies. Distributed with BBMap.
  • Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks. The Biopieces work on a data stream in such a way that the data stream can be passed through several different Biopieces, each performing one specific task: modifying or adding records to the data stream, creating plots, or uploading data to databases and web services.
  • COPE[40] COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly.
  • DeconRNASeq is an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data.
  • FastQ Screen screens FASTQ format sequences to a set of databases to confirm that the sequences contain what is expected (such as species content, adapters, vectors, etc.).
  • FLASH is a read pre-processing tool. FLASH combines paired-end reads which overlap and converts them to single long reads.
  • IDCheck
  • ORNA and ORNA Q/K A tool for reducing redundancy in RNA-seq data which reduces the computational resource requirements of an assembler
  • PANDASeq.is a program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence.
  • PEAR[41] PEAR: Illumina Paired-End reAd mergeR.
  • qRNASeq script The qRNAseq tool can be used to accurately eliminate PCR duplicates from RNA-Seq data if Molecular Indexes™ or other stochastic labels have been used during library prep.
  • SHERA[42] a SHortread Error-Reducing Aligner.
  • XORRO Rapid Paired-End Read Overlapper.
  • DecontaMiner[43] detects contamination in RNA-Seq data.

Alignment tools

After quality control, the first step of RNA-Seq analysis involves alignment of the sequenced reads to a reference genome (if available) or to a transcriptome database. See also List of sequence alignment software.

Short (unspliced) aligners

Short aligners are able to align continuous reads (not containing gaps result of splicing) to a genome of reference. Basically, there are two types: 1) based on the Burrows–Wheeler transform method such as Bowtie and BWA, and 2) based on Seed-extend methods, Needleman–Wunsch or Smith–Waterman algorithms. The first group (Bowtie and BWA) is many times faster, however some tools of the second group tend to be more sensitive, generating more correctly aligned reads.

  • BFAST aligns short reads to reference sequences and presents particular sensitivity towards errors, SNPs, insertions and deletions. BFAST works with the Smith–Waterman algorithm.
  • Bowtie is a short aligner using an algorithm based on the Burrows–Wheeler transform and the FM-index. Bowtie tolerates a small number of mismatches.
  • Bowtie2 Bowtie 2 is a memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly recommended for aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM-index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
  • Burrows–Wheeler Aligner (BWA) BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.
  • Short Oligonucleotide Analysis Package (SOAP)
  • GNUMAP performs alignment using a probabilistic Needleman–Wunsch algorithm. This tool is able to handle alignment in repetitive regions of a genome without losing information. The output of the program was developed to make possible easy visualization using available software.
  • Maq first aligns reads to reference sequences and after performs a consensus stage. On the first stage performs only ungapped alignment and tolerates up to 3 mismatches.
  • Mosaik Mosaik is able to align reads containing short gaps using Smith–Waterman algorithm, ideal to overcome SNPs, insertions and deletions.
  • NovoAlign (commercial) is a short aligner to the Illumina platform based on Needleman–Wunsch algorithm. It is able to deal with bisulphite data. Output in SAM format.
  • PerM is a software package which was designed to perform highly efficient genome scale alignments for hundreds of millions of short reads produced by the ABI SOLiD and Illumina sequencing platforms. PerM is capable of providing full sensitivity for alignments within 4 mismatches for 50bp SOLID reads and 9 mismatches for 100bp Illumina reads.
  • RazerS
  • SEAL uses a MapReduce model to produce distributed computing on clusters of computers. Seal uses BWA to perform alignment and Picard MarkDuplicates to detection and duplicate read removal.
  • segemehl
  • SeqMap
  • SHRiMP employs two techniques to align short reads. Firstly, the q-gram filtering technique based on multiple seeds identifies candidate regions. Secondly, these regions are investigated in detail using Smith–Waterman algorithm.
  • SMALT
  • Stampy combines the sensitivity of hash tables and the speed of BWA. Stampy is prepared to alignment of reads containing sequence variation like insertions and deletions. It is able to deal with reads up to 4500 bases and presents the output in SAM format.
  • Subread[44] is a read aligner. It uses the seed-and-vote mapping paradigm to determine the mapping location of the read by using its largest mappable region. It automatically decides whether the read should be globally mapped or locally mapped. For RNA-seq data, Subread should be used for the purpose of expression analysis. Subread can also be used to map DNA-seq reads.
  • ZOOM (commercial) is a short aligner of the Illumina/Solexa 1G platform. ZOOM uses extended spaced seeds methodology building hash tables for the reads, and tolerates mismatches and insertions and deletions.
  • WHAM WHAM is a high-throughput sequence alignment tool developed at University of Wisconsin-Madison. It aligns short DNA sequences (reads) to the whole human genome at a rate of over 1500 million 60bit/s reads per hour, which is one to two orders of magnitudes faster than the leading state-of-the-art techniques.

Spliced aligners

Many reads span exon-exon junctions and can not be aligned directly by Short aligners, thus specific aligners were necessary - Spliced aligners. Some Spliced aligners employ Short aligners to align firstly unspliced/continuous reads (exon-first approach), and after follow a different strategy to align the rest containing spliced regions - normally the reads are split into smaller segments and mapped independently. See also.[45][46]

Aligners based on known splice junctions (annotation-guided aligners)

In this case the detection of splice junctions is based on data available in databases about known junctions. This type of tools cannot identify new splice junctions. Some of this data comes from other expression methods like expressed sequence tags (EST).

  • Erange is a tool to alignment and data quantification to mammalian transcriptomes.
  • IsoformEx
  • MapAL
  • OSA
  • RNA-MATE is a computational pipeline for alignment of data from Applied Biosystems SOLID system. Provides the possibility of quality control and trimming of reads. The genome alignments are performed using mapreads and the splice junctions are identified based on a library of known exon-junction sequences. This tool allows visualization of alignments and tag counting.
  • RUM performs alignment based on a pipeline, being able to manipulate reads with splice junctions, using Bowtie and Blat. The flowchart starts doing alignment against a genome and a transcriptome database executed by Bowtie. The next step is to perform alignment of unmapped sequences to the genome of reference using BLAT. In the final step all alignments are merged to get the final alignment. The input files can be in FASTA or FASTQ format. The output is presented in RUM and SAM format.
  • RNASEQR.
  • SAMMate
  • SpliceSeq
  • X-Mate

De novo splice aligners

De novo Splice aligners allow the detection of new Splice junctions without need to previous annotated information (some of these tools present annotation as a suplementar option).

  • ABMapper
  • BBMap Uses short kmers to align reads directly to the genome (spanning introns to find novel isoforms) or transcriptome. Highly tolerant of substitution errors and indels, and very fast. Supports output of all SAM tags needed by Cufflinks. No limit to genome size or number of splices per read. Supports Illumina, 454, Sanger, Ion Torrent, PacBio, and Oxford Nanopore reads, paired or single-ended. Does not use any splice-site-finding heuristics optimized for a single taxonomic branch, but rather finds optimally-scoring multi-affine-transform global alignments, and thus is ideal for studying new organisms with no annotation and unknown splice motifs. Open-source, written in pure Java; supports all platforms with no recompilation and no other dependencies.
  • ContextMap was developed to overcome some limitations of other mapping approaches, such as resolution of ambiguities. The central idea of this tool is to consider reads in gene expression context, improving this way alignment accuracy. ContextMap can be used as a stand-alone program and supported by mappers producing a SAM file in the output (e.g.: TopHat or MapSplice). In stand-alone mode aligns reads to a genome, to a transcriptome database or both.
  • CRAC propose a novel way of analyzing reads that integrates genomic locations and local coverage, and detect candidate mutations, indels, splice or fusion junctions in each single read. Importantly, CRAC improves its predictive performance when supplied with e.g. 200 nt reads and should fit future needs of read analyses.
  • GSNAP
  • GMAP A Genomic Mapping and Alignment Program for mRNA and EST Sequences.
  • HISAT is a spliced alignment program for mapping RNA-seq reads. In addition to one global FM-index that represents a whole genome, HISAT uses a large set of small FM-indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM-index.
  • HISAT2 is an alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for graphs [Sirén et al. 2014], we designed and implemented a graph FM-index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).
  • HMMSplicer can identify canonical and non-canonical splice junctions in short-reads. Firstly, unspliced reads are removed with Bowtie. After that, the remaining reads are one at a time divided in half, then each part is seeded against a genome and the exon borders are determined based on the Hidden Markov Model. A quality score is assigned to each junction, useful to detect false positive rates.
  • MapSplice
  • PALMapper
  • Pass[47] aligns gapped, ungapped reads and also bisulfite sequencing data. It includes the possibility to filter data before alignment (remotion of adapters). Pass uses Needleman–Wunsch and Smith–Waterman algorithms, and performs alignment in 3 stages: scanning positions of seed sequences in the genome, testing the contiguous regions and finally refining the alignment.
  • PASSion
  • PASTA
  • QPALMA predicts splice junctions supported on machine learning algorithms. In this case the training set is a set of spliced reads with quality information and already known alignments.
  • RASER:[48] reads aligner for SNPs and editing sites of RNA.
  • SeqSaw
  • SoapSplice A tool for genome-wide ab initio detection of splice junction sites from RNA-Seq, a method using new generation sequencing technologies to sequence the messenger RNA.
  • SpliceMap
  • SplitSeek
  • SuperSplat was developed to find all type of splice junctions. The algorithm splits each read in all possible two-chunk combinations in an iterative way, and alignment is tried to each chunck. Output in "Supersplat" format.
De novo splice aligners that also use annotation optionally
  • MapNext
  • OLego
  • STAR is a tool that employs "sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure", detects canonical, non-canonical splices junctions and chimeric-fusion sequences. It is already adapted to align long reads (third-generation sequencing technologies) and can reach speeds of 45 million paired reads per hour per processor.[49]
  • Subjunc[44] is a specialized version of Subread. It uses all mappable regions in an RNA-seq read to discover exons and exon-exon junctions. It uses the donor/receptor signals to find the exact splicing locations. Subjunc yields full alignments for every RNA-seq read including exon-spanning reads, in addition to the discovered exon-exon junctions. Subjunc should be used for the purpose of junction detection and genomic variation detection in RNA-seq data.
  • TopHat[50] is prepared to find de novo junctions. TopHat aligns reads in two steps. Firstly, unspliced reads are aligned with Bowtie. After, the aligned reads are assembled with Maq resulting islands of sequences. Secondly, the splice junctions are determined based on the initially unmapped reads and the possible canonical donor and acceptor sites within the island sequences.
Other spliced aligners
  • G.Mo.R-Se is a method that uses RNA-Seq reads to build de novo gene models.

Evaluation of alignment tools

  • AlignerBoost is a generalized software toolkit for boosting Next-Gen sequencing mapping precision using a Bayesian-based mapping quality framework.
  • CADBURE Bioinformatics tool for evaluating aligner performance on your RNA-Seq dataset.
  • QualiMap: Evaluating next generation sequencing alignment data.
  • RNAseqEVAL A collection of tools for evaluating RNA seq mapping.
  • Teaser: Individualized benchmarking and optimization of read mapping results for NGS data.

Normalization, quantitative analysis and differential expression

General tools

These tools perform normalization and calculate the abundance of each gene expressed in a sample.[51] RPKM, FPKM and TPMs[52] are some of the units employed to quantification of expression. Some software are also designed to study the variability of genetic expression between samples (differential expression). Quantitative and differential studies are largely determined by the quality of reads alignment and accuracy of isoforms reconstruction. Several studies are available comparing differential expression methods.[53][54][55]

  • ABSSeq a new RNA-Seq analysis method based on modelling absolute expression differences.
  • ALDEx2 is a tool for comparative analysis of high-throughput sequencing data. ALDEx2 uses compositional data analysis and can be applied to RNAseq, 16S rRNA gene sequencing, metagenomic sequencing, and selective growth experiments.
  • Alexa-Seq is a pipeline that makes possible to perform gene expression analysis, transcript specific expression analysis, exon junction expression and quantitative alternative analysis. Allows wide alternative expression visualization, statistics and graphs.
  • ARH-seq – identification of differential splicing in RNA-seq data.
  • ASC[56]
  • Ballgown
  • BaySeq is a Bioconductor package to identify differential expression using next-generation sequencing data, via empirical Bayesian methods. There is an option of using the "snow" package for parallelisation of computer data processing, recommended when dealing with large data sets.
  • GMNB[57] is a Bayesian method to temporal gene differential expression analysis across different phenotypes or treatment conditions that naturally handles the heterogeneity of sequencing depth in different samples, removing the need for ad-hoc normalization.
  • BBSeq
  • BitSeq (Bayesian Inference of Transcripts from Sequencing Data) is an application for inferring expression levels of individual transcripts from sequencing (RNA-Seq) data and estimating differential expression (DE) between conditions.
  • CEDER Accurate detection of differentially expressed genes by combining significance of exons using RNA-Seq.
  • CPTRA The CPTRA package is for analyzing transcriptome sequencing data from different sequencing platforms. It combines advantages of 454, Illumina GAII, or other platforms and can perform sequence tag alignment and annotation, expression quantification tasks.
  • casper is a Bioconductor package to quantify expression at the isoform level. It combines using informative data summaries, flexible estimation of experimental biases and statistical precision considerations which (reportedly) provide substantial reductions in estimation error.
  • Cufflinks/Cuffdiff is appropriate to measure global de novo transcript isoform expression. It performs assembly of transcripts, estimation of abundances and determines differential expression (Cuffdiff) and regulation in RNA-Seq samples.[58]
  • DESeq is a Bioconductor package to perform differential gene expression analysis based on negative binomial distribution.
  • DEGSeq
  • Derfinder Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach.
  • DEvis is a powerful, integrated solution for the analysis of differential expression data. Using DESeq2 as a framework, DEvis provides a wide variety of tools for data manipulation, visualization, and project management.
  • DEXSeq is Bioconductor package that finds differential differential exon usage based on RNA-Seq exon counts between samples. DEXSeq employs negative binomial distribution, provides options to visualization and exploration of the results.
  • DEXUS is a Bioconductor package that identifies differentially expressed genes in RNA-Seq data under all possible study designs such as studies without replicates, without sample groups, and with unknown conditions.[59] In contrast to other methods, DEXUS does not need replicates to detect differentially expressed transcripts, since the replicates (or conditions) are estimated by the EM method for each transcript.
  • DGEclust is a Python package for clustering expression data from RNA-seq, CAGE and other NGS assays using a Hierarchical Dirichlet Process Mixture Model. The estimated cluster configurations can be post-processed in order to identify differentially expressed genes and for generating gene- and sample-wise dendrograms and heatmaps.[60]
  • DiffSplice is a method for differential expression detection and visualization, not dependent on gene annotations. This method is supported on identification of alternative splicing modules (ASMs) that diverge in the different isoforms. A non-parametric test is applied to each ASM to identify significant differential transcription with a measured false discovery rate.
  • EBSeq is a Bioconductor package for identifying genes and isoforms differentially expressed (DE) across two or more biological conditions in an RNA-seq experiment. It also can be used to identify DE contigs after performing de novo transcriptome assembly. While performing DE analysis on isoforms or contigs, different isoform/contig groups have varying estimation uncertainties. EBSeq models the varying uncertainties using an empirical Bayes model with different priors.
  • EdgeR is a R package for analysis of differential expression of data from DNA sequencing methods, like RNA-Seq, SAGE or ChIP-Seq data. edgeR employs statistical methods supported on negative binomial distribution as a model for count variability.
  • EdgeRun an R package for sensitive, functionally relevant differential expression discovery using an unconditional exact test.
  • EQP The exon quantification pipeline (EQP): a comprehensive approach to the quantification of gene, exon and junction expression from RNA-seq data.
  • ESAT The End Sequence Analysis Toolkit (ESAT) is specially designed to be applied for quantification of annotation of specialized RNA-Seq gene libraries that target the 5' or 3' ends of transcripts.
  • eXpress performance includes transcript-level RNA-Seq quantification, allele-specific and haplotype analysis and can estimate transcript abundances of the multiple isoforms present in a gene. Although could be coupled directly with aligners (like Bowtie), eXpress can also be used with de novo assemblers and thus is not needed a reference genome to perform alignment. It runs on Linux, Mac and Windows.
  • ERANGE performs alignment, normalization and quantification of expressed genes.
  • featureCounts an efficient general-purpose read quantifier.
  • FDM
  • FineSplice Enhanced splice junction detection and estimation from RNA-Seq data.
  • GFOLD[61] Generalized fold change for ranking differentially expressed genes from RNA-seq data.
  • globalSeq[62] Global test for counts: testing for association between RNA-Seq and high-dimensional data.
  • GPSeq This is a software tool to analyze RNA-seq data to estimate gene and exon expression, identify differentially expressed genes, and differentially spliced exons.
  • IsoDOT – Differential RNA-isoform Expression.
  • Limma Limma powers differential expression analyses for RNA-sequencing and microarray studies.
  • LPEseq accurately test differential expression with a limited number of replicates.
  • Kallisto "Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build."
  • MATS Multivariate Analysis of Transcript Splicing (MATS).
  • MAPTest provides a general testing framework for differential expression analysis of RNA-Seq time course experiment. Method of the pack is based on latent negative-binomial Gaussian mixture model. The proposed test is optimal in the maximum average power. The test allows not only identification of traditional DE genes but also testing of a variety of composite hypotheses of biological interest.[63]
  • MetaDiff Differential isoform expression analysis using random-effects meta-regression.
  • metaseqR is a Bioconductor package that detects differentially expressed genes from RNA-Seq data by combining six statistical algorithms using weights estimated from their performance with simulated data estimated from real data, either public or user-based. In this way, metaseqR optimizes the tradeoff between precision and sensitivity.[64] In addition, metaseqR creates a detailed and interactive report with a variety of diagnostic and exploration plots and auto-generated text.
  • MMSEQ is a pipeline for estimating isoform expression and allelic imbalance in diploid organisms based on RNA-Seq. The pipeline employs tools like Bowtie, TopHat, ArrayExpressHTS and SAMtools. Also, edgeR or DESeq to perform differential expression.
  • MultiDE
  • Myrna is a pipeline tool that runs in a cloud environment (Elastic MapReduce) or in a unique computer for estimating differential gene expression in RNA-Seq datasets. Bowtie is employed for short read alignment and R algorithms for interval calculations, normalization, and statistical processing.
  • NEUMA is a tool to estimate RNA abundances using length normalization, based on uniquely aligned reads and mRNA isoform models. NEUMA uses known transcriptome data available in databases like RefSeq.
  • NOISeq NOISeq is a non-parametric approach for the identification of differentially expressed genes from count data or previously normalized count data. NOISeq empirically models the noise distribution of count changes by contrasting fold-change differences (M) and absolute expression differences (D) for all the features in samples within the same condition.
  • NPEBseq is a nonparametric empirical Bayesian-based method for differential expression analysis.
  • NSMAP allows inference of isoforms as well estimation of expression levels, without annotated information. The exons are aligned and splice junctions are identified using TopHat. All the possible isoforms are computed by a combination of the detected exons.
  • NURD an implementation of a new method to estimate isoform expression from non-uniform RNA-seq data.
  • PANDORA An R package for the analysis and result reporting of RNA-Seq data by combining multiple statistical algorithms.
  • PennSeq PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution.
  • QuasR Quantify and Annotate Short Reads in R.
  • RapMap A Rapid, Sensitive and Accurate Tool for Mapping RNA-seq Reads to Transcriptomes.
  • recursiveCorPlot Correlation based clustering for RNA-seq data (+ ggplot corrplot-like interface - R-package: recursiveCorPlot).[65]
  • RNAeXpress Can be run with Java GUI or command line on Mac, Windows, and Linux. It can be configured to perform read counting, feature detection or GTF comparison on mapped rnaseq data.
  • Rcount Rcount: simple and flexible RNA-Seq read counting.
  • rDiff is a tool that can detect differential RNA processing (e.g. alternative splicing, polyadenylation or ribosome occupancy).
  • RNASeqPower Calculating samples Size estimates for RNA Seq studies. R package version.
  • RNA-Skim RNA-Skim: a rapid method for RNA-Seq quantification at transcript-level.
  • rSeq rSeq is a set of tools for RNA-Seq data analysis. It consists of programs that deal with many aspects of RNA-Seq data analysis, such as read quality assessment, reference sequence generation, sequence mapping, gene and isoform expressions (RPKMs) estimation, etc.
  • RSEM
  • rQuant is a web service (Galaxy (computational biology) installation) that determines abundances of transcripts per gene locus, based on quadratic programming. rQuant is able to evaluate biases introduced by experimental conditions. A combination of tools is employed: PALMapper (reads alignment), mTiM and mGene (inference of new transcripts).
  • Salmon is a software tool for computing transcript abundance from RNA-seq data using either an alignment-free (based directly on the raw reads) or an alignment-based (based on pre-computed alignments) approach. It uses an online stochastic optimization approach to maximize the likelihood of the transcript abundances under the observed data. The software itself is capable of making use of many threads to produce accurate quantification estimates quickly. It is part of the Sailfish suite of software, and is the successor to the Sailfish tool.
  • SAJR is a java-written read counter and R-package for differential splicing analysis. It uses junction reads to estimate exon exclusion and reads mapped within exon to estimate its inclusion. SAJR models it by GLM with quasibinomial distribution and uses log likelihood test to assess significance.
  • Scotty Performs power analysis to estimate the number of replicates and depth of sequencing required to call differential expression.
  • Seal alignment-free algorithm to quantify sequence expression by matching kmers between raw reads and a reference transcriptome. Handles paired reads and alternate isoforms, and uses little memory. Accepts all common read formats, and outputs read counts, coverage, and FPKM values per reference sequence. Open-source, written in pure Java; supports all platforms with no recompilation and no other dependencies. Distributed with BBMap. (Seal - Sequence Expression AnaLyzer - is unrelated to the SEAL distributed short-read aligner.)
  • semisup[66] Semi-supervised mixture model: detecting SNPs with interactive effects on a quantitative trait
  • Sleuth is a program for analysis of RNA-Seq experiments for which transcript abundances have been quantified with kallisto.
  • SplicingCompass differential splicing detection using RNA-Seq data.
  • sSeq The purpose of this R package is to discover the genes that are differentially expressed between two conditions in RNA-seq experiments.
  • StringTie is an assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. It was designed as a successor to Cufflinks (its developers include some of the Cufflinks developers) and has many of the same features.
  • TIGAR Transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference.
  • TimeSeq Detecting Differentially Expressed Genes in Time Course RNA-Seq Data.
  • TPMCalculator[67] one-step software to quantify mRNA abundance of genomic features.
  • WemIQ is a software tool to quantify isoform expression and exon splicing ratios from RNA-seq data accurately and robustly.

Evaluation of quantification and differential expression

  • CompcodeR RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods.
  • DEAR-O Differential Expression Analysis based on RNA-seq data – Online.
  • PROPER comprehensive power evaluation for differential expression using RNA-seq.
  • RNAontheBENCH computational and empirical resources for benchmarking RNAseq quantification and differential expression methods.
  • rnaseqcomp Several quantitative and visualized benchmarks for RNA-seq quantification pipelines. Two-condition quantifications for genes, transcripts, junctions or exons by each pipeline with nessasery meta information should be organized into numeric matrices in order to proceed the evaluation.

Multi-tool solutions

  • DEB is a web-interface/pipeline that permits to compare results of significantly expressed genes from different tools. Currently are available three algorithms: edgeR, DESeq and bayseq.
  • SARTools A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data.

Transposable Element expression

  • TeXP is a Transposable Element quantification pipeline that deconvolves pervasive transcription from autonomous transcription of LINE-1 elements.[68]

Workbench (analysis pipeline / integrated solutions)

Commercial solutions

  • ActiveSite by Cofactor Genomics
  • Avadis NGS (currently Strand NGS)
  • BaseSpace by Illumina
  • Biowardrobe an integrated platform for analysis of epigenomics and transcriptomics data.
  • BBrowser a platform for analyzing public and in-house single-cell transcriptomics data
  • CLC Genomics Workbench
  • DNASTAR
  • ERGO
  • Genedata
  • GeneSpring GX
  • Genevestigator by Nebion (basic version is for free for academic researchers).
  • geospiza
  • Golden Helix
  • Maverix Biomics
  • NextGENe
  • OmicsOffice
  • Omics Playground[69] by BigOmics Analytics. A user-friendly platform for the analysis of RNA-Seq and proteomics data. Free up to three datasets.
  • Partek Flow Comprehensive single cell analysis within an intuitive interface.
  • Qlucore. Easy to use for analysis and visualization. One button import of BAM files.
  • VulcanPlotAI. Implement Ai into your vulcanPlot.

Open (free) source solutions

  • ArrayExpressHTS is a BioConductor package that allows preprocessing, quality assessment and estimation of expression of RNA-Seq datasets. It can be run remotely at the European Bioinformatics Institute cloud or locally. The package makes use of several tools: ShortRead (quality control), Bowtie, TopHat or BWA (alignment to a reference genome), SAMtools format, Cufflinks or MMSEQ (expression estimation).
  • BioJupies is a web-based platform that provides complete RNA-seq analysis solution from free alignment service to a complete data analysis report delivered as an interactive Jupyter Notebook.
  • BioQueue is a web-based queue engine designed preferentially to improve the efficiency and robustness of job execution in bioinformatics research by estimating the system resources required by a certain job. At the same time, BioQueue also aims to promote the accessibility and reproducibility of data analysis in biomedical research. Implemented by Python 2.7, BioQueue can work in both POSIX compatible systems (Linux, Solaris, OS X, etc.) and Windows. See also.[70]
  • BioWardrobe is an integrated package that for analysis of ChIP-Seq and RNA-Seq datasets using a web-based user-friendly GUI. For RNA-Seq Biowardrobe performs mapping, quality control, RPKM estimation and differential expression analysis between samples (groups of samples). Results of differential expression analysis can be integrated with ChIP-Seq data to build average tag density profiles and heat maps. The package makes use of several tools open source tools including STAR and DESeq. See also.[71]
  • Chipster is a user-friendly analysis software for high-throughput data. It contains over 350 analysis tools for next generation sequencing (NGS), microarray, proteomics and sequence data. Users can save and share automatic analysis workflows, and visualize data interactively using a built-in genome browser and many other visualizations.
  • DEWE (Differential Expression Workflow Executor) is an open source desktop application that provides a user-friendly GUI for easily executing Differential Expression analyses in RNA-Seq data. Currently, DEWE provides two differential expression analysis workflows: HISAT2, StringTie and Ballgown and Bowtie2, StringTie and R libraries (Ballgown and edgeR). It runs in Linux, Windows and Mac OS X.
  • easyRNASeq Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as 'RPKM' or by the 'DESeq' or 'edgeR' package.
  • ExpressionPlot
  • FASTGenomics is an online platform to share single-cell RNA sequencing data and analyses using reproducible workflows. Gene expression data can be shared meeting European data protection standards (GDPR). FASTGenomics enables the user to upload their own data and generate customized and reproducible workflows for the exploration and analysis of gene expression data (Scholz et al. 2018).
  • FX FX is a user-Friendly RNA-Seq gene eXpression analysis tool, empowered by the concept of cloud-computing. With FX, you can simply upload your RNA-Seq raw FASTQ data on the cloud, and let the computing infra to do the heavy analysis.
  • Galaxy: Galaxy is a general purpose workbench platform for computational biology.
  • GENE-Counter is a Perl pipeline for RNA-Seq differential gene expression analyses. Gene-counter performs alignments with CASHX, Bowtie, BWA or other SAM output aligner. Differential gene expression is run with three optional packages (NBPSeq, edgeR and DESeq) using negative binomial distribution methods. Results are stored in a MySQL database to make possible additional analyses.
  • GenePattern is a freely available online platform that provides access to RNA-Seq analysis methods without the need for programming.
  • GeneProf Freely accessible, easy to use analysis pipelines for RNA-seq and ChIP-seq experiments.
  • GREIN is an interactive web platform for re-processing and re-analyzing GEO RNA-seq data. GREIN is powered by the back-end computational pipeline for uniform processing of RNA-seq data and the large number (>5,800) of already processed data sets. The front-end user friendly interfaces provide a wealth of user-analytics options including sub-setting and downloading processed data, interactive visualization, statistical power analyses, construction of differential gene expression signatures and their comprehensive functional characterization, connectivity analysis with LINCS L1000 data, etc.
  • GT-FAR is an RNA seq pipeline that performs RNA-seq QC, alignment, reference free quantification, and splice variant calling. It filters, trims, and sequentially aligns reads to gene models and predicts and validates new splice junctions after which it quantifies expression for each gene, exon, and known/novel splice junction, and Variant Calling.
  • MultiExperiment Viewer (MeV) is suitable to perform analysis, data mining and visualization of large-scale genomic data. The MeV modules include a variety of algorithms to execute tasks like Clustering and Classification, Student's t-test, Gene Set Enrichment Analysis or Significance Analysis. MeV runs on Java.
  • NGSUtils is a suite of software tools for working with next-generation sequencing datasets.
  • Rail-RNA Scalable analysis of RNA-seq splicing and coverage.
  • RAP RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.
  • RSEQtools "RSEQtools consists of a set of modules that perform common tasks such as calculating gene expression values, generating signal tracks of mapped reads, and segmenting that signal into actively transcribed regions. In addition to the anonymization afforded by this format it also facilitates the decoupling of the alignment of reads from downstream analyses."
  • RobiNA provides a user graphical interface to deal with R/BioConductor packages. RobiNA provides a package that automatically installs all required external tools (R/Bioconductor frameworks and Bowtie). This tool offers a diversity of quality control methods and the possibility to produce many tables and plots supplying detailed results for differential expression. Furthermore, the results can be visualized and manipulated with MapMan and PageMan. RobiNA runs on Java version 6.
  • RseqFlow is an RNA-Seq analysis pipeline which offers an express implementation of analysis steps for RNA sequencing datasets. It can perform pre and post mapping quality control (QC) for sequencing data, calculate expression levels for uniquely mapped reads, identify differentially expressed genes, and convert file formats for ease of visualization.
  • S-MART handles mapped RNA-Seq data, and performs essentially data manipulation (selection/exclusion of reads, clustering and differential expression analysis) and visualization (read information, distribution, comparison with epigenomic ChIP-Seq data). It can be run on any laptop by a person without computer background. A friendly graphical user interface makes easy the operation of the tools.
  • Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.
  • TCW is a Transcriptome Computational Workbench.
  • TRAPLINE a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation.
  • ViennaNGS A toolbox for building efficient next- generation sequencing analysis pipelines.
  • wapRNA This is a free web-based application for the processing of high-throughput RNA-Seq data (wapRNA) from next generation sequencing (NGS) platforms, such as Genome Analyzer of Illumina Inc. (Solexa) and SOLiD of Applied Biosystems (SOLiD). wapRNA provides an integrated tool for RNA sequence, refers to the use of High-throughput sequencing technologies to sequence cDNAs in order to get information about a sample's RNA content.

Alternative splicing analysis

General tools

  • Alternative Splicing Analysis Tool Package(ASATP) Alternative splicing analysis tool package (ASATP) includes a series of toolkits to analyze alternative splicing events, which could be used to detect and visualized alternative splicing events, check ORF changes, assess regulations of alternative splicing and do statistical analysis.
  • Asprofile is a suite of programs for extracting, quantifying and comparing alternative splicing (AS) events from RNA-seq data.
  • AStalavista The AStalavista web server extracts and displays alternative splicing (AS) events from a given genomic annotation of exon-intron gene coordinates. By comparing all given transcripts, AStalavista detects the variations in their splicing structure and identify all AS events (like exon skipping, alternate donor, etc.) by assigning to each of them an AS code.
  • CLASS2 accurate and efficient splice variant annotation from RNA-seq reads.
  • Cufflinks/Cuffdiff
  • DEXseq Inference of differential exon usage in RNA-Seq.
  • Diceseq Statistical modeling of isoform splicing dynamics from RNA-seq time series data.
  • EBChangepoint An empirical Bayes change-point model for identifying 3′ and 5′ alternative splicing by RNA-Seq.
  • Eoulsan A versatile framework dedicated to high throughput sequencing data analysis. Allows automated analysis (mapping, counting and differencial analysis with DESeq2).
  • GESS for de novo detection of exon-skipping event sites from raw RNA-seq reads.
  • LeafCutter a suite of novel methods that allow identification and quantication of novel and existing alternative splicing events by focusing on intron excisions.
  • LEMONS[72] A Tool for the Identification of Splice Junctions in Transcriptomes of Organisms Lacking Reference Genomes.
  • MAJIQ. Modeling Alternative Junction Inclusion Quantification.
  • MATS Multivariate Analysis of Transcript Splicing (MATS).
  • MISO quantifies the expression level of splice variants from RNA-Seq data and is able to recognize differentially regulated exons/isoforms across different samples. MISO uses a probabilistic method (Bayesian inference) to calculate the probability of the reads origin.
  • Rail-RNA Scalable analysis of RNA-seq splicing and coverage.
  • RPASuite[73] RPASuite (RNA Processing Analysis Suite) is a computational pipeline to identify differentially and coherently processed transcripts using RNA-seq data obtained from multiple tissue or cell lines.
  • RSVP RSVP is a software package for prediction of alternative isoforms of protein-coding genes, based on both genomic DNA evidence and aligned RNA-seq reads. The method is based on the use of ORF graphs, which are more general than the splice graphs used in traditional transcript assembly.
  • SAJR calculates the number of the reads that confirms segment (part of gene between two nearest splice sites) inclusion or exclusion and then model these counts by GLM with quasibinomial distribution to account for biological variability.
  • SGSeq A R package to de novo prediction of splicing events.
  • SplAdder Identification, quantification and testing of alternative splicing events from RNA-Seq data.
  • SpliceGrapher Prediction of novel alternative splicing events from RNA-Seq data. Also includes graphical tools for visualizing splice graphs.[74][75]
  • SpliceJumper a classification-based approach for calling splicing junctions from RNA-seq data.
  • SplicePie is a pipeline to analyze non-sequential and multi-step splicing. SplicePie contains three major analysis steps: analyzing the order of splicing per sample, looking for recursive splicing events per sample and summarizing predicted recursive splicing events for all analyzed sample (it is recommended to use more samples for higher reliability). The first two steps are performed individually on each sample and the last step looks at the overlap in all samples. However, the analysis can be run on one sample as well.
  • SplicePlot is a tool for visualizing alternative splicing and the effects of splicing quantitative trait loci (sQTLs) from RNA-seq data. It provides a simple command line interface for drawing sashimi plots, hive plots, and structure plots of alternative splicing events from .bam, .gtf, and .vcf files.
  • SpliceR An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data.
  • SpliceSEQ SpliceViewer is a Java application that allows researchers to investigate alternative mRNA splicing patterns in data from high-throughput mRNA sequencing studies. Sequence reads are mapped to splice graphs that unambiguously quantify the inclusion level of each exon and splice junction. The graphs are then traversed to predict the protein isoforms that are likely to result from the observed exon and splice junction reads. UniProt annotations are mapped to each protein isoform to identify potential functional impacts of alternative splicing.
  • SpliceTrap[76] is a statistical tool for the quantification of exon inclusion ratios from RNA-seq data.
  • Splicing Express – a software suite for alternative splicing analysis using next-generation sequencing data.
  • SUPPA This tool generates different Alternative Splicing (AS) events and calculates the PSI ("Percentage Spliced In") value for each event exploiting the quantification of transcript abundances from multiple samples.
  • SwitchSeq identifies extreme changes in splicing (switch events).
  • Portcullis identification of genuine splice junctions.
  • TrueSight A Self-training Algorithm for Splice Junction Detection using RNA-seq.
  • Vast-tools A toolset for profiling alternative splicing events in RNA-Seq data.

Intron retention analysis

  • IRcall / IRclassifier IRcall is a computational tool for IR event detection from RNA-Seq data. IRclassifier is a supervised machine learning-based approach for IR event detection from RNA-Seq data.

Differential isoform/transcript usage

  • IsoformSwitchAnalyzeR IsoformSwitchAnalyzeR is an R package that enables statistical identification of isoform switches with predicted functional consequences where the consequences of interest can be chosen from a long list but includes gain/loss of protein domains, signal peptides changes in NMD sensitivity.[77] IsoformSwitchAnalyzeR is made for post analysis of data from any full length isoform/transcript quantification tool but directly support Cufflinks/Cuffdiff, RSEM Kallisto and Salmon.
  • DRIMSeq An R package that utilizes generalized linear modeling (GLM) to identify isoform switches from estimated isoform count data.[78]
  • BayesDRIMSeq An R package containing a Bayesian implementation of DRIMSeq.[79]
  • Cufflinks/Cuffdiff Full length isoform/transcript quantification and differential analysis tool which amongst other test for changes in usage for isoform belonging to the same primary transcript (sharing a TSS) via a one-sided t-test based on the asymptotic of the Jensen-Shannon metric.[58]
  • rSeqNP An R package that implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data.[80]
  • Isolator Full length isoform/transcript quantification and differential analysis tool which analyses all samples in an experiment in unison using a simple Bayesian hierarchical model. Can identify differential isoform usage by testing for probability of monotonic splicing.[81]

Fusion genes/chimeras/translocation finders/structural variations

Genome arrangements result of diseases like cancer can produce aberrant genetic modifications like fusions or translocations. Identification of these modifications play important role in carcinogenesis studies.[82]

  • Arriba[83] is a fusion detection algorithm based on the STAR[49] RNA-Seq aligner. It is the winner of the DREAM Challenge about fusion detection.[84] Arriba can also detect viral integration sites, internal tandem duplications, whole exon duplications, circular RNAs, enhancer hijacking events involving immunoglobulin/T-cell receptor loci, and breakpoints in introns or intergenic regions.
  • Bellerophontes [85]
  • BreakDancer [86]
  • BreakFusion [87]
  • ChimeraScan [88]
  • EBARDenovo [89]
  • Easyfuse is a pipeline that combines STAR-Fusion[90] and Fusioncatcher[91] to detect fusion transcripts from RNA-seq data with high accuracy. An older version (1.3.7) also includes InFusion,[92] MapSplice2[93] and SoapFuse[94] to detect fusions with maximal sensitivity.[95]
  • EricScript [96]
  • DEEPEST is a statistical fusion detection algorithm.[97] DEEPEST can also detect Circular RNAs.
  • DeFuse DeFuse is a software package for gene fusion discovery using RNA-Seq data.[98]
  • Dr. Disco Dr. Disco is a fusion detector that takes into account the entire reference genome and is therefore also able to detect genomic breakpoints. It is therefore in particular suited for rRNA-minus RNA-seq.[7]
  • egfr-v3-determiner EGFR-v3-determiner is a tool that counts EGFRvIII and EGFRwt splice/structural variant directly from alignment files.[99]
  • FusionAnalyser FusionAnalyser uses paired reads mapping to different genes (Bridge reads).[100]
  • FusionCatcher FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data (stranded/unstranded paired-end reads from Illumina NGS platforms) from diseased samples.[91]
  • FusionHunter identifies fusion transcripts without depending on already known annotations. It uses Bowtie as a first aligner and paired-end reads.
  • FusionMap FusionMap is a fusion aligner which aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions. It detects and characterizes fusion junctions at base-pair resolution. FusionMap can be applied to detect fusion junctions in both single- and paired-end dataset from either gDNA-Seq or RNA-Seq studies.[101]
  • FusionSeq [102]
  • InFusion[92]
  • JAFFA is based on the idea of comparing a transcriptome against a reference transcriptome rather than a genome-centric approach like other fusion finders.[103]
  • MapSplice[93]
  • nFuse[104]
  • Oncomine NGS RNA-Seq Gene Expression Browser.
  • PRADA [105]
  • SOAPFuse detects fusion transcripts from human paired-end RNA-Seq data. It outperforms other five similar tools in both computation and fusion detection performance using both real and simulated data.[94]
  • SOAPfusion [106]
  • STAR-Fusion [90]
  • TopHat-Fusion is based on TopHat version and was developed to handle reads resulting from fusion genes. It does not require previous data about known genes and uses Bowtie to align continuous reads.[107]
  • ViralFusionSeq [108] is high-throughput sequencing (HTS) tool for discovering viral integration events and reconstruct fusion transcripts at single-base resolution.
  • ViReMa (Viral Recombination Mapper) detects and reports recombination or fusion events in and between virus and host genomes using deep sequencing datasets.[109]

Copy number variation identification

  • CNVseq detects copy number variations supported on a statistical model derived from array-comparative genomic hybridization. Sequences alignment are performed by BLAT, calculations are executed by R modules and is fully automated using Perl. There are few other bioinformatics tools that can call CNA from RNA-Seq.[110]

Single cell RNA-Seq

Single cell sequencing. The traditional RNA-Seq methodology is commonly known as "bulk RNA-Seq", in this case RNA is extracted from a group of cells or tissues, not from the individual cell like it happens in single cell methods. Some tools available to bulk RNA-Seq are also applied to single cell analysis, however to face the specificity of this technique new algorithms were developed.

  • CEL-Seq[111] single-cell RNA-Seq by multiplexed linear amplification.
  • Drop-Seq[112] Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.
  • FISSEQ Single cell transcriptome sequencing in situ, i.e. without dissociating the cells.
  • Oscope: a statistical pipeline for identifying oscillatory genes in unsynchronized single cell RNA-seq experiments.
  • SCUBA[113] Extracting lineage relationships and modeling dynamic changes associated with multi-lineage cell differentiation.
  • scLVM [114] scLVM is a modelling framework for single-cell RNA-seq data that can be used to dissect the observed heterogeneity into different sources, thereby allowing for the correction of confounding sources of variation.
  • scM&T-Seq Parallel single-cell sequencing.
  • Sphinx[115] SPHINX is a hybrid binning approach that achieves high binning efficiency by utilizing both 'compositional' and 'similarity' features of the query sequence during the binning process. SPHINX can analyze sequences in metagenomic data sets as rapidly as composition based approaches, but nevertheless has the accuracy and specificity of similarity based algorithms.
  • TraCeR[116] Paired T-cell receptor reconstruction from single-cell RNA-Seq reads.
  • VDJPuzzle[117] T-cell receptor reconstruction from single-cell RNA-Seq reads and link the clonotype with the functional phenotype and transcriptome of individual cells.

Integrated Packages

  • Monocle[118] Differential expression and time-series analysis for single-cell RNA-Seq and qPCR experiments.
  • SCANPY[119][120] Scalable Python-based implementation for preprocessing, visualization, clustering, trajectory inference and differential expression testing.
  • SCell[121] integrated analysis of single-cell RNA-seq data.
  • Seurat[122][123] R package designed for QC, analysis, and exploration of single-cell RNA-seq data.
  • Sincell[124] an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq.
  • SINCERA[125] A Pipeline for Single-Cell RNA-Seq Profiling Analysis.

Quality Control and Gene Filtering

  • Celloline[126] A pipeline for mapping and quality assessment single cell RNA-seq data.
  • OEFinder[127] A user interface to identify and visualize ordering effects in single-cell RNA-seq data.
  • SinQC[128] A Method and Tool to Control Single-cell RNA-seq Data Quality.

Data cleaning and denoising

  • AutoClass[129] A universal AI algorithm for in-depth cleaning of single cell RNA-Seq data.

Normalization

  • BASiCS[130] Understanding changes in gene expression at the single-cell level.
  • GRM[131] Normalization and noise reduction for single cell RNA-seq experiments.

Dimension Reduction

  • ZIFA[132] Dimensionality reduction for zero-inflated single-cell gene expression analysis.

Differential Expression

  • BPSC[133] An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq.
  • MAST[134] a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data.
  • SCDE[135] Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis.

Visualization

RNA-Seq simulators

These Simulators generate in silico reads and are useful tools to compare and test the efficiency of algorithms developed to handle RNA-Seq data. Moreover, some of them make possible to analyse and model RNA-Seq protocols.

  • BEERS Simulator is formatted to mouse or human data, and paired-end reads sequenced on Illumina platform. Beers generates reads starting from a pool of gene models coming from different published annotation origins. Some genes are chosen randomly and afterwards are introduced deliberately errors (like indels, base changes and low quality tails), followed by construction of novel splice junctions.
  • compcodeR RNAseq data simulation, differential expression analysis and performance comparison of differential expression methods.
  • CuReSim a customized read simulator.
  • Flux simulator implements a computer pipeline simulation to mimic a RNA-Seq experiment. All component steps that influence RNA-Seq are taken into account (reverse transcription, fragmentation, adapter ligation, PCR amplification, gel segregation and sequencing) in the simulation. These steps present experimental attributes that can be measured, and the approximate experimental biases are captured. Flux Simulator allows joining each of these steps as modules to analyse different type of protocols.
  • PBSIM PacBio reads simulator - toward accurate genome assembly.
  • Polyester This bioconductor package can be used to simulate RNA-seq reads from differential expression experiments with replicates. The reads can then be aligned and used to perform comparisons of methods for differential expression.
  • RandomReads Generates synthetic reads from a genome with an Illumina or PacBio error model. The reads may be paired or unpaired, with arbitrary length and insert size, output in fasta or fastq, RandomReads has a wide selection of options for mutation rates, with individual settings for substitution, deletion, insertion, and N rates and length distributions, annotating reads with their original, unmutated genomic start and stop location. RandomReads does not vary expression levels and thus is not designed to simulate RNA-seq experiments, but to test the sensitivity and specificity of RNA-seq aligners with de-novo introns. Includes a tool for grading and generating ROC curves from resultant sam files. Open-source, written in pure Java; supports all platforms with no recompilation and no other dependencies. Distributed with BBMap.
  • rlsim is a software package for simulating RNA-seq library preparation with parameter estimation.
  • rnaseqbenchmark A Benchmark for RNA-seq Quantification Pipelines.
  • rnaseqcomp Benchmarks for RNA-seq Quantification Pipelines.
  • RSEM Read Simulator RSEM provides users the ‘rsem-simulate-reads’ program to simulate RNA-Seq data based on parameters learned from real data sets.
  • RNASeqReadSimulator contains a set of simple Python scripts, command line driven. It generates random expression levels of transcripts (single or paired-end), equally simulates reads with a specific positional bias pattern and generates random errors from sequencing platforms.
  • RNA Seq Simulator RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.
  • SimSeq A Nonparametric Approach to Simulation of RNA-Sequence Datasets.
  • WGsim Wgsim is a small tool for simulating sequence reads from a reference genome. It is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL) polymorphisms, and simulate reads with uniform substitution sequencing errors. It does not generate INDEL sequencing errors, but this can be partly compensated by simulating INDEL polymorphisms.

Transcriptome assemblers

The transcriptome is the total population of RNAs expressed in one cell or group of cells, including non-coding and protein-coding RNAs. There are two types of approaches to assemble transcriptomes. Genome-guided methods use a reference genome (if possible a finished and high quality genome) as a template to align and assembling reads into transcripts. Genome-independent methods does not require a reference genome and are normally used when a genome is not available. In this case reads are assembled directly in transcripts.

Genome-guided assemblers

  • Bayesembler Bayesian transcriptome assembly.
  • CIDANE a comprehensive isoform discovery and abundance estimation.
  • CLASS CLASS is a program for assembling transcripts from RNA-seq reads aligned to a genome. CLASS produces a set of transcripts in three stages. Stage 1 uses linear programming to determine a set of exons for each gene. Stage 2 builds a splice graph representation of a gene, by connecting the exons (vertices) via introns (edges) extracted from spliced read alignments. Stage 3 selects a subset of the candidate transcripts encoded in the graph that can explain all the reads, using either a parsimonius (SET_COVER) or a dynamic programming optimization approach. This stage takes into account constraints derived from mate pairs and spliced alignments and, optionally, knowledge about gene structure extracted from known annotation or alignments of cDNA sequences.
  • Cufflinks Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
  • iReckon iReckon is an algorithm for the simultaneous isoform reconstruction and abundance estimation. In addition to modelling novel isoforms, multi-mapped reads and read duplicates, this method takes into account the possible presence of unspliced pre-mRNA and intron retention. iReckon only requires a set of transcription start and end sites, but can use known full isoforms to improve sensitivity. Starting from the set of nearly all possible isoforms, iReckon uses a regularized EM algorithm to determine those actually present in the sequenced sample, together with their abundances. iReckon is multi-threaded to increase efficiency in all its time-consuming steps.
  • IsoInfer IsoInfer is a C/C++ program to infer isoforms based on short RNA-Seq (single-end and paired-end) reads, exon-intron boundary and TSS/PAS information.
  • IsoLasso IsoLasso is an algorithm to assemble transcripts and estimate their expression levels from RNA-Seq reads.
  • Flipflop FlipFlop implements a method for de novo transcript discovery and abundance estimation from RNA-Seq data. It differs from Cufflinks by simultaneously performing the identification and quantitation tasks using a convex penalized maximum likelihood approach.
  • GIIRA GIIRA is a gene prediction method that identifies potential coding regions exclusively based on the mapping of reads from an RNA-Seq experiment. It was foremost designed for prokaryotic gene prediction and is able to resolve genes within the expressed region of an operon. However, it is also applicable to eukaryotes and predicts exon intron structures as well as alternative isoforms.
  • MITIE Simultaneous RNA-Seq-based Transcript Identification and Quantification in Multiple Samples.
  • RNAeXpress RNA-eXpress was designed as a user friendly solution to extract and annotate biologically important transcripts from next generation RNA sequencing data. This approach complements existing gene annotation databases by ensuring all transcripts present in the sample are considered for further analysis.
  • Scripture Scripture is a method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio. The statistical methods to estimate read coverage significance are also applicable to other sequencing data. Scripture also has modules for ChIP-Seq peak calling.
  • SLIDE Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation.
  • Strawberry A program for genome-guided transcripts reconstruction and quantification from paired-end RNA-seq.
  • StringTie StringTie is an assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads. To identify differentially expressed genes between experiments, StringTie's output can be processed either by the Cuffdiff or Ballgown programs.
  • TransComb a genome-guided transcriptome assembly via combing junctions in splicing graphs.
  • Traph A tool for transcript identification and quantification with RNA-Seq.
  • Tiling Assembly for Annotation-independent Novel Gene Discovery.

Genome-independent (de novo) assemblers

  • Bridger[136] was developed at Shandong University, takes advantage of techniques employed in Cufflinks to overcome limitations of the existing de novo assemblers.
  • CLC de novo assembly algorithm of CLC Genomics Workbench.
  • KISSPLICE is a software that enables to analyse RNA-seq data with or without a reference genome. It is an exact local transcriptome assembler that allows to identify SNPs, indels and alternative splicing events. It can deal with an arbitrary number of biological conditions, and will quantify each variant in each condition.
  • Oases De novo transcriptome assembler for very short reads.
  • rnaSPAdes
  • Rnnotator an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads.
  • SAT-Assembler
  • SOAPdenovo-Trans
  • Scaffolding Translation Mapping
  • Trans-ABySS
  • T-IDBA
  • Trinity a method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.
  • Velvet
  • TransLiG

Assembly evaluation tools

  • Busco provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB tool.
  • Detonate DETONATE (DE novo TranscriptOme rNa-seq Assembly with or without the Truth Evaluation) consists of two component packages, RSEM-EVAL and REF-EVAL. Both packages are mainly intended to be used to evaluate de novo transcriptome assemblies, although REF-EVAL can be used to compare sets of any kinds of genomic sequences.
  • rnaQUAST Quality Assessment Tool for Transcriptome Assemblies.
  • TransRate Transrate is software for de-novo transcriptome assembly quality analysis. It examines your assembly in detail and compares it to experimental evidence such as the sequencing reads, reporting quality scores for contigs and assemblies. This allows you to choose between assemblers and parameters, filter out the bad contigs from an assembly, and help decide when to stop trying to improve the assembly.

Co-expression networks

  • GeneNetWeaver is an open-source tool for in silico benchmark generation and performance profiling of network inference methods.
  • WGCNA is an R package for weighted correlation network analysis.
  • Pigengene is an R package that infers biological information from gene expression profiles. Based on a coexpression network, it computes eigengenes and effectively uses them as features to fit decision trees and Bayesian networks that are useful in diagnosis and prognosis.[137]

miRNA prediction and analysis

  • iSRAP[138] a one-touch research tool for rapid profiling of small RNA-seq data.
  • SPAR[139] small RNA-seq, short total RNA-seq, miRNA-seq, single-cell small RNA-seq data processing, analysis, annotation, visualization, and comparison against reference ENCODE and DASHR datasets.
  • miRDeep2
  • MIReNA
  • miRExpress
  • miR-PREFeR m
  • miRDeep-P For plants
  • miRDeep
  • miRPlant
  • MiRdup
  • ShortStack[140] An alignment and annotation suite intended for small RNA analysis in plants, noted for its focus on high-confidence annotations

Visualization tools

  • ABrowse a customizable next-generation genome browser framework.
  • Artemis Artemis is a free genome browser and annotation tool that allows visualisation of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation.
  • Apollo Apollo is designed to support geographically dispersed researchers, and the work of a distributed community is coordinated through automatic synchronization: all edits in one client are instantly pushed to all other clients, allowing users to see annotation updates from collaborators in real-time during the editing process.
  • BamView BamView is a free interactive display of read alignments in BAM data files. It has been developed by the Pathogen Group at the Sanger Institute.
  • BrowserGenome:[141] web-based RNA-seq data analysis and visualization.
  • Degust An interactive web tool for visualising Differential Gene Expression data.
  • DensityMap is a perl tool for the visualization of features density along chromosomes.
  • EagleView EagleView is an information-rich genome assembler viewer with data integration capability. EagleView can display a dozen different types of information including base qualities, machine specific trace signals, and genome feature annotations.
  • expvip-web a customisable RNA-seq data analysis and visualisation platform.
  • GBrowse
  • Integrative Genomics Viewer (IGV)
  • GenomeView
  • MapView
  • MicroScope comprehensive genome analysis software suite for gene expression heatmaps.
  • ReadXplorer ReadXplorer is a freely available comprehensive exploration and evaluation tool for NGS data. It extracts and adds quantity and quality measures to each alignment in order to classify the mapped reads. This classification is then taken into account for the different data views and all supported automatic analysis functions.
  • RNASeqExpressionBrowser is a web-based tool which provides means for the search and visualization of RNA-seq expression data (e.g. based on sequence-information or domain annotations). It can generate detailed reports for selected genes including expression data and associated annotations. If needed, links to (publicly available) databases can be easily integrated. The RNASeqExpressionBrowser allows password protection and thereby access restriction to authorized users only.
  • Savant Savant is a next-generation genome browser designed for the latest generation of genome data.
  • Samscope
  • SeqMonk
  • Tablet[142] TTablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.
  • Tbrowse- HTML5 Transcriptome Browser
  • TBro a transcriptome browser for de novo RNA-sequencing experiments.
  • Vespa

Functional, network and pathway analysis tools

  • BioCyc Visualize RNA-seq data onto individual pathway diagrams, multi-pathway diagrams called pathway collages, and zoomable organism-specific metabolic map diagrams. Computes pathway enrichment.
  • BRANE Clust Biologically-Related Apriori Network Enhancement for Gene Regulatory Network Inference combined with clustering.[143]
  • BRANE Cut Biologically-Related Apriori Network Enhancement with Graph cuts for Gene Regulatory Network Inference.[144]
  • FunRichFunctional Enrichment analysis tool.
  • GAGE is applicable independent of sample sizes, experimental design, assay platforms, and other types of heterogeneity.[145] This Biocondutor package also provides functions and data for pathway, GO and gene set analysis in general.
  • Gene Set Association Analysis for RNA-Seq GSAASeq are computational methods that assess the differential expression of a pathway/gene set between two biological states based on sequence count data.
  • GeneSCF a real-time based functional enrichment tool with support for multiple organisms.[146]
  • GOexpress[147] Visualise microarray and RNAseq data using gene ontology annotations.
  • GOSeq[148] Gene Ontology analyser for RNA-seq and other length biased data.
  • GSAASEQSP[149] A Toolset for Gene Set Association Analysis of RNA-Seq Data.
  • GSVA[150] gene set variation analysis for microarray and RNA-Seq data.
  • Heat*Seq an interactive web tool for high-throughput sequencing experiment comparison with public data.
  • Ingenuity Systems (commercial) iReport & IPA
  • PathwaySeq[151] Pathway analysis for RNA-Seq data using a score-based approach.
  • petal Co-expression network modelling in R.
  • ToPASeq:[152] an R package for topology-based pathway analysis of microarray and RNA-Seq data.
  • RNA-Enrich A cut-off free functional enrichment testing method for RNA-seq with improved detection power.
  • TRAPID[153][154] Rapid Analysis of Transcriptome Data.
  • T-REx[155] RNA-seq expression analysis.

Further annotation tools for RNA-Seq data

  • Frama From RNA-seq data to annotated mRNA assemblies.
  • HLAminer is a computational method for identifying HLA alleles directly from whole genome, exome and transcriptome shotgun sequence datasets. HLA allele predictions are derived by targeted assembly of shotgun sequence data and comparison to a database of reference allele sequences. This tool is developed in perl and it is available as console tool.
  • pasaPASA, acronym for Program to Assemble Spliced Alignments, is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments.
  • seq2HLA is an annotation tool for obtaining an individual's HLA class I and II type and expression using standard NGS RNA-Seq data in fastq format. It comprises mapping RNA-Seq reads against a reference database of HLA alleles using bowtie, determining and reporting HLA type, confidence score and locus-specific expression level. This tool is developed in Python and R. It is available as console tool or Galaxy module.

Compression tools

  • Genozip[156] A compressor for genomic files including FASTQ, SAM/BAM/CRAM, VCF, GFF/GVF/GTF. Contains built-in methods for compression of RNA-Seq BAM files. (Genozip website).
  • Quark Quark enables semi-reference-based compression of RNA-seq data.

RNA-Seq databases

  • ARCHS4 Uniformly processed RNA-seq data from GEO/SRA (>300,000 samples) with metadata search to locate subsets of published samples.
  • ENA The European Nucleotide Archive (ENA) provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.
  • ENCODE
  • queryable-rna-seq-database Formally known as the Queryable RNA-Seq Database, this system is designed to simplify the process of RNA-seq analysis by providing the ability upload the result data from RNA-Seq analysis into a database, store it, and query it in many different ways.
  • CIRCpedia v2 is an updated comprehensive database containing circRNA annotations from over 180 RNA-seq datasets across six different species. This atlas allows users to search, browse and download circRNAs with expression characteristics/features in various cell types/tissues, including disease samples. In addition, the updated database incorporates conservation analysis of circRNAs between humans and mice.

Human related

  • Brain RNA-Seq[157] An RNA-Seq transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex.
  • FusionCancer[158] a database of cancer fusion genes derived from RNA-seq data.
  • Hipposeq a comprehensive RNA-seq database of gene expression in hippocampal principal neurons.
  • Mitranscriptome is a systematic list of long poly-adenylated Human RNA transcripts based on RNA-Seq data from more than 6,500 samples associated with a variety of cancer and tissue types. The database contains detailed gene expression analysis of over 91,000 genes, most are uncharacterized long RNAs.
  • RNA-Seq Atlas a reference database for gene expression profiling in normal tissue by next-generation sequencing.
  • SRA The Sequence Read Archive (SRA) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence.
  • DASHR A database of human small RNA genes and mature products derived from small RNA-seq data.

Single species' RNA-Seq databases

  • Aedes-albopictus Aedes albopictus database.
  • Arabidopsis thaliana TraVa the database of gene expression profiles in Arabidopsis thaliana based on RNA-seq analysis.
  • Barley morexGe
  • EORNA, a barley gene and transcript abundance database (The James Hutton Institute).
  • Chickpea Chickpea transcriptome database (CTDB) has been developed with the view to provide most comprehensive information about the chickpea transcriptome, the most relevant part of the genome.
  • Chilo suppressalis ChiloDB: a genomic and transcriptome database for an important rice insect pest Chilo suppressalis.
  • Fruit fly FlyAtlas 2 – Drosophila melanogaster RNA-seq database.
  • Echinoderm EchinoDB – a repository of orthologous transcripts from echinoderms.
  • Equine transcriptome (University of California, Davis).
  • Escherichia coli Ecomics – an omics normalized database for Escherichia coli.
  • Fish Phylofish.
  • Ginger Ginger - Ginger transcriptome database.
  • Lygodium japonicum Lygodium japonicum Transcriptome Database.
  • Mammals Mammalian Transcriptomic Database.
  • Oyster (Pacific) GigaTon: an extensive publicly searchable database providing a new reference transcriptome in the pacific oyster Crassostrea gigas.
  • Mouse and Human PanglaoDB:[159] A gene expression database for exploration and meta-analysis of single cell sequencing data.
  • Mangrove Mangrove Transcriptome Database.
  • Krill (Antarctic) KrillDB: a de novo Transcriptome Database for the Antarctic Krill.
  • Mouse RNASeqMetaDB: a database and web server for navigating metadata of publicly available mouse RNA-Seq data sets.
  • Rubus Rubus GDR RefTrans V1 - GDR Rubus RefTrans combines published RNA-Seq and EST data sets to create a reference transcriptome (RefTrans) for rubus and provides putative gene function identified by homology to known proteins.
  • Sorghum MOROKOSHI Sorghum transcriptome database. RIKEN full-length cDNA clone and RNA-Seq data in Sorghum bicolor.
  • S. purpuratus S. purpuratus - Developmental Transcriptomes of S. purpuratus
  • S. cerevisiae YeastMine transcriptome database.
  • Wheat WheatExp – an RNA-seq expression database for polyploid wheat.

References

  1. "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews. Genetics 10 (1): 57–63. January 2009. doi:10.1038/nrg2484. PMID 19015660. 
  2. "RNA Sequencing and Analysis". Cold Spring Harbor Protocols 2015 (11): 951–969. April 2015. doi:10.1101/pdb.top084970. PMID 25870306. 
  3. "A survey of best practices for RNA-seq data analysis". Genome Biology 17 (13): 13. January 2016. doi:10.1186/s13059-016-0881-8. PMID 26813401. 
  4. "RNA Sequencing and analysis". Canadian Bioinformatics Workshops. 2012. http://bioinformatics.ca//files/public/BiCG_2012_Module7.pdf. 
  5. "Feasibility of sample size calculation for RNA-seq studies". Briefings in Bioinformatics 19 (4): 713–720. July 2018. doi:10.1093/bib/bbw144. PMID 28100468. 
  6. "Multi-perspective quality control of Illumina RNA sequencing data analysis". Briefings in Functional Genomics 16 (4): 194–204. July 2017. doi:10.1093/bfgp/elw035. PMID 27687708. 
  7. 7.0 7.1 "Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA-minus RNA sequencing data". GigaScience 10 (12): giab080. December 2021. doi:10.1093/gigascience/giab080. PMID 34891161. 
  8. dupRadar: Assessment of duplication rates in RNA-Seq datasets. R package version 1.1.0.. 2015. doi:10.18129/B9.bioc.dupRadar. http://bioconductor.org/packages/devel/bioc/html/dupRadar.html. 
  9. "Kraken: a set of tools for quality control and analysis of high-throughput sequence data". Methods 63 (1): 41–49. September 2013. doi:10.1016/j.ymeth.2013.06.027. PMID 23816787. 
  10. "HTSeq--a Python framework to work with high-throughput sequencing data". Bioinformatics 31 (2): 166–169. January 2015. doi:10.1093/bioinformatics/btu638. PMID 25260700. 
  11. "mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data". Nature Communications 6 (7816): 7816. August 2015. doi:10.1038/ncomms8816. PMID 26234653. Bibcode2015NatCo...6.7816F. 
  12. "MultiQC: summarize analysis results for multiple tools and samples in a single report". Bioinformatics 32 (19): 3047–3048. October 2016. doi:10.1093/bioinformatics/btw354. PMID 27312411. 
  13. "RNA-SeQC: RNA-seq metrics for quality control and process optimization". Bioinformatics 28 (11): 1530–1532. June 2012. doi:10.1093/bioinformatics/bts196. PMID 22539670. 
  14. "RSeQC: quality control of RNA-seq experiments". Bioinformatics 28 (16): 2184–2185. August 2012. doi:10.1093/bioinformatics/bts356. PMID 22743226. 
  15. "SAMStat: monitoring biases in next generation sequencing data". Bioinformatics 27 (1): 130–131. January 2011. doi:10.1093/bioinformatics/btq614. PMID 21088025. 
  16. "IVT-seq reveals extreme bias in RNA sequencing". Genome Biology 15 (6): R86. June 2014. doi:10.1186/gb-2014-15-6-r86. PMID 24981968. 
  17. "Detecting and correcting systematic variation in large-scale RNA sequencing data". Nature Biotechnology 32 (9): 888–895. September 2014. doi:10.1038/nbt.3000. PMID 25150837. 
  18. "Summarizing and correcting the GC content bias in high-throughput sequencing". Nucleic Acids Research 40 (10): e72. May 2012. doi:10.1093/nar/gks001. PMID 22323520. 
  19. "Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries". Genome Biology 12 (2): R18. 2011. doi:10.1186/gb-2011-12-2-r18. PMID 21338519. 
  20. "Comparative analysis of RNA sequencing methods for degraded or low-input samples". Nature Methods 10 (7): 623–629. July 2013. doi:10.1038/nmeth.2483. PMID 23685885. 
  21. "Sequence-specific error profile of Illumina sequencers". Nucleic Acids Research 39 (13): e90. July 2011. doi:10.1093/nar/gkr344. PMID 21576222. 
  22. "Biases in Illumina transcriptome sequencing caused by random hexamer priming". Nucleic Acids Research 38 (12): e131. July 2010. doi:10.1093/nar/gkq224. PMID 20395217. 
  23. "AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads". Genomics 102 (5–6): 500–506. November 2013. doi:10.1016/j.ygeno.2013.07.011. PMID 23912058. 
  24. "ConDeTri--a content dependent read trimmer for Illumina data". PLOS ONE 6 (10): e26314. 19 October 2011. doi:10.1371/journal.pone.0026314. PMID 22039460. Bibcode2011PLoSO...626314S. 
  25. "FLASH: fast length adjustment of short reads to improve genome assemblies". Bioinformatics 27 (21): 2957–2963. November 2011. doi:10.14806/ej.17.1.200. PMID 21903629. 
  26. "Erne-Bs5". Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine. 12. 2012. pp. 12–19. doi:10.1145/2382936.2382938. ISBN 9781450316705. 
  27. "Quality control and preprocessing of metagenomic datasets". Bioinformatics 27 (6): 863–864. March 2011. doi:10.1093/bioinformatics/btr026. PMID 21278185. 
  28. "Allele identification for transcriptome-based population genomics in the invasive plant Centaurea solstitialis". G3 3 (2): 359–367. February 2013. doi:10.1534/g3.112.003871. PMID 23390612. 
  29. "Trimmomatic: a flexible trimmer for Illumina sequence data". Bioinformatics 30 (15): 2114–2120. August 2014. doi:10.1093/bioinformatics/btu170. PMID 24695404. 
  30. "Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction". Briefings in Bioinformatics 17 (1): 154–179. January 2016. doi:10.1093/bib/bbv029. PMID 26026159. 
  31. "Removing noise from pyrosequenced amplicons". BMC Bioinformatics 12 (38): 38. January 2011. doi:10.1186/1471-2105-12-38. PMID 21276213. 
  32. "BLESS: bloom filter-based error correction solution for high-throughput sequencing reads". Bioinformatics 30 (10): 1354–1362. May 2014. doi:10.1093/bioinformatics/btu030. PMID 24451628. 
  33. "Blue: correcting sequencing errors using consensus and context". Bioinformatics 30 (19): 2723–2732. October 2014. doi:10.1093/bioinformatics/btu368. PMID 24919879. 
  34. Michael I Love; John B Hogenesch; Rafael A Irizarry (2015). "Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation". bioRxiv 10.1101/025767.
  35. "Removing technical variability in RNA-seq data using conditional quantile normalization". Biostatistics 13 (2): 204–216. April 2012. doi:10.1093/biostatistics/kxr054. PMID 22285995. 
  36. "GC-content normalization for RNA-Seq data". BMC Bioinformatics 12 (1): 480. December 2011. doi:10.1186/1471-2105-12-480. PMID 22177264. 
  37. "Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses". Nature Protocols 7 (3): 500–507. February 2012. doi:10.1038/nprot.2011.457. PMID 22343431. 
  38. "Normalization of RNA-seq data using factor analysis of control genes or samples". Nature Biotechnology 32 (9): 896–902. September 2014. doi:10.1038/nbt.2931. PMID 25150836. 
  39. "Identification and correction of systematic error in high-throughput sequence data". BMC Bioinformatics 12 (1): 451. November 2011. doi:10.1186/1471-2105-12-451. PMID 22099972. 
  40. "COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly". Bioinformatics 28 (22): 2870–2874. November 2012. doi:10.1093/bioinformatics/bts563. PMID 23044551. 
  41. "PEAR: a fast and accurate Illumina Paired-End reAd mergeR". Bioinformatics 30 (5): 614–620. March 2014. doi:10.1093/bioinformatics/btt593. PMID 24142950. 
  42. "Unlocking short read sequencing for metagenomics". PLOS ONE 5 (7): e11840. July 2010. doi:10.1371/journal.pone.0011840. PMID 20676378. Bibcode2010PLoSO...511840R. 
  43. "From trash to treasure: detecting unexpected contamination in unmapped NGS data". BMC Bioinformatics 20 (Suppl 4): 168. April 2019. doi:10.1186/s12859-019-2684-x. PMID 30999839. 
  44. 44.0 44.1 "The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote". Nucleic Acids Research 41 (10): e108. May 2013. doi:10.1093/nar/gkt214. PMID 23558742. 
  45. "Methods to Study Splicing from High-Throughput RNA Sequencing Data". Spliceosomal Pre-mRNA Splicing. Methods in Molecular Biology. 1126. 2014. pp. 357–97. doi:10.1007/978-1-62703-980-2_26. ISBN 978-1-62703-979-6. 
  46. "Simulation-based comprehensive benchmarking of RNA-seq aligners". Nature Methods 14 (2): 135–139. February 2017. doi:10.1038/nmeth.4106. PMID 27941783. 
  47. "PASS-bis: a bisulfite aligner suitable for whole methylome analysis of Illumina and SOLiD reads". Bioinformatics 29 (2): 268–270. January 2013. doi:10.1093/bioinformatics/bts675. PMID 23162053. 
  48. "RASER: reads aligner for SNPs and editing sites of RNA". Bioinformatics 31 (24): 3906–3913. December 2015. doi:10.1093/bioinformatics/btv505. PMID 26323713. 
  49. 49.0 49.1 "STAR: ultrafast universal RNA-seq aligner". Bioinformatics 29 (1): 15–21. January 2013. doi:10.1093/bioinformatics/bts635. PMID 23104886. 
  50. "TopHat: discovering splice junctions with RNA-Seq". Bioinformatics 25 (9): 1105–1111. May 2009. doi:10.1093/bioinformatics/btp120. PMID 19289445. 
  51. Pachter L (2011). "Models for transcript quantification from RNA-Seq". arXiv:1104.3889 [q-bio.GN].
  52. "Comprehensive evaluation of RNA-seq quantification methods for linearity". BMC Bioinformatics 18 (Suppl 4): 117. March 2017. doi:10.1186/s12859-017-1526-y. PMID 28361706. 
  53. "A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data". American Journal of Botany 99 (2): 248–256. February 2012. doi:10.3732/ajb.1100340. PMID 22268221. 
  54. "A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis". Briefings in Bioinformatics 14 (6): 671–683. November 2013. doi:10.1093/bib/bbs046. PMID 22988256. 
  55. "Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions". Briefings in Bioinformatics 19 (5): 776–792. September 2018. doi:10.1093/bib/bbx008. PMID 28334202. 
  56. "Empirical bayes analysis of sequencing-based transcriptional profiling without replicates". BMC Bioinformatics 11: 564. November 2010. doi:10.1186/1471-2105-11-564. PMID 21080965. 
  57. Hajiramezanali, E. & Dadaneh, S. Z. & Figueiredo, P. d. & Sze, S. & Zhou, Z. & Qian, X. Differential Expression Analysis of Dynamical Sequencing Count Data with a Gamma Markov Chain. arXiv:1803.02527
  58. 58.0 58.1 "Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation". Nature Biotechnology 28 (5): 511–515. May 2010. doi:10.1038/nbt.1621. PMID 20436464. 
  59. "DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions". Nucleic Acids Research 41 (21): e198. November 2013. doi:10.1093/nar/gkt834. PMID 24049071. 
  60. "DGEclust: differential expression analysis of clustered count data". Genome Biology 16 (1): 39. February 2015. doi:10.1186/s13059-015-0604-6. PMID 25853652. 
  61. "GFOLD: a generalized fold change for ranking differentially expressed genes from RNA-seq data". Bioinformatics 28 (21): 2782–2788. November 2012. doi:10.1093/bioinformatics/bts515. PMID 22923299. 
  62. "Testing for association between RNA-Seq and high-dimensional data". BMC Bioinformatics 17 (118): 118. March 2016. doi:10.1186/s12859-016-0961-5. PMID 26951498. 
  63. "Large scale maximum average power multiple inference on time-course count data with application to RNA-seq analysis". Biometrics 76 (1): 9–22. March 2020. doi:10.1111/biom.13144. PMID 31483480. 
  64. "Systematic integration of RNA-Seq statistical algorithms for accurate detection of differential gene expression patterns". Nucleic Acids Research 43 (4): e25. February 2015. doi:10.1093/nar/gku1273. PMID 25452340. 
  65. Hoogstrate, Youri; Draaisma, Kaspar; Ghisai, Santoesha A.; van Hijfte, Levi; Barin, Nastaran; de Heer, Iris; Coppieters, Wouter; van den Bosch, Thierry P. P. et al. (9 March 2023). "Transcriptome analysis reveals tumor microenvironment changes in glioblastoma". Cancer Cell 41 (4): 678–692.e7. doi:10.1016/j.ccell.2023.02.019. PMID 36898379. 
  66. Rauschenberger A, Menezes RX, van de Wiel MA, van Schoor NM, Jonker MA (2018). "Detecting SNPs with interactive effects on a quantitative trait". arXiv:1805.09175 [stat.ME].
  67. "TPMCalculator: one-step software to quantify mRNA abundance of genomic features". Bioinformatics 35 (11): 1960–1962. June 2019. doi:10.1093/bioinformatics/bty896. PMID 30379987. 
  68. "TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements". PLOS Computational Biology 15 (8): e1007293. August 2019. doi:10.1371/journal.pcbi.1007293. PMID 31425522. Bibcode2019PLSCB..15E7293N. 
  69. "Omics Playground: a comprehensive self-service platform for visualization, analytics and exploration of Big Omics Data". NAR Genomics and Bioinformatics 2 (1): lqz019. March 2020. doi:10.1093/nargab/lqz019. PMID 33575569. 
  70. "BioQueue: a novel pipeline framework to accelerate bioinformatics analysis". Bioinformatics 33 (20): 3286–3288. October 2017. doi:10.1093/bioinformatics/btx403. PMID 28633441. 
  71. "BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data". Genome Biology 16 (1): 158. August 2015. doi:10.1186/s13059-015-0720-3. PMID 26248465. 
  72. "LEMONS - A Tool for the Identification of Splice Junctions in Transcriptomes of Organisms Lacking Reference Genomes". PLOS ONE 10 (11): e0143329. 2015. doi:10.1371/journal.pone.0143329. PMID 26606265. Bibcode2015PLoSO..1043329L. 
  73. "Differential and coherent processing patterns from small RNAs". Scientific Reports 5: 12062. July 2015. doi:10.1038/srep12062. PMID 26166713. Bibcode2015NatSR...512062P. 
  74. "SpliceGrapher: detecting patterns of alternative splicing from RNA-Seq data in the context of gene models and EST data". Genome Biology 13 (1): R4. January 2012. doi:10.1186/gb-2012-13-1-r4. PMID 22293517. 
  75. "SpliceGrapherXT". Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics. BCB'13. New York, NY, USA: ACM. 2013. pp. 247:247–247:255. doi:10.1145/2506583.2506625. ISBN 9781450324342. http://doi.acm.org/10.1145/2506583.2506625. 
  76. "SpliceTrap: a method to quantify alternative splicing under single cellular conditions". Bioinformatics 27 (21): 3010–3016. November 2011. doi:10.1093/bioinformatics/btr508. PMID 21896509. 
  77. "The Landscape of Isoform Switches in Human Cancers". Molecular Cancer Research 15 (9): 1206–1220. September 2017. doi:10.1158/1541-7786.mcr-16-0459. PMID 28584021. 
  78. "DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics". F1000Research 5: 1356. 2016-12-06. doi:10.12688/f1000research.8900.2. PMID 28105305. 
  79. "Bayesian estimation of differential transcript usage from RNA-seq data". Statistical Applications in Genetics and Molecular Biology 16 (5–6): 367–386. November 2017. doi:10.1515/sagmb-2017-0005. PMID 29091583. Bibcode2017arXiv170103095P. 
  80. "rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data". Bioinformatics 31 (13): 2222–2224. July 2015. doi:10.1093/bioinformatics/btv119. PMID 25717189. 
  81. Jones DC, Kuppusamy KT, Palpant NJ, Peng X, Murry CE, Ruohola-Baker H, Ruzzo WL (2016-11-20). "Isolator: accurate and stable analysis of isoform-level expression in RNA-Seq experiments". bioRxiv 10.1101/088765.
  82. "Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data". Scientific Reports 6 (21587): 21597. February 2016. doi:10.1038/srep21597. PMID 26862001. Bibcode2016NatSR...621597K. 
  83. "Accurate and efficient detection of gene fusions from RNA sequencing data". Genome Research 31 (3): 448–460. March 2021. doi:10.1101/gr.257246.119. PMID 33441414. 
  84. "A community challenge to evaluate RNA-seq, fusion detection, and isoform quantification methods for cancer discovery". Cell Systems 12 (8): 827–838.e5. August 2021. doi:10.1016/j.cels.2021.05.021. PMID 34146471. 
  85. Abate, Francesco; Acquaviva, Andrea; Paciello, Giulia; Foti, Carmelo; Ficarra, Elisa; Ferrarini, Alberto; Delledonne, Massimo; Iacobucci, Ilaria et al. (2012-08-15). "Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model". Bioinformatics 28 (16): 2114–2121. doi:10.1093/bioinformatics/bts334. ISSN 1367-4811. PMID 22711792. 
  86. Fan, Xian; Abbott, Travis E.; Larson, David; Chen, Ken (2014). "BreakDancer: Identification of Genomic Structural Variation from Paired-End Read Mapping". Current Protocols in Bioinformatics 45: 15.6.1–11. doi:10.1002/0471250953.bi1506s45. ISSN 1934-340X. PMID 25152801. 
  87. Chen, Ken; Wallis, John W.; Kandoth, Cyriac; Kalicki-Veizer, Joelle M.; Mungall, Karen L.; Mungall, Andrew J.; Jones, Steven J.; Marra, Marco A. et al. (2012-07-15). "BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data". Bioinformatics 28 (14): 1923–1924. doi:10.1093/bioinformatics/bts272. ISSN 1367-4811. PMID 22563071. 
  88. Iyer, Matthew K.; Chinnaiyan, Arul M.; Maher, Christopher A. (2011-08-11). "ChimeraScan: a tool for identifying chimeric transcription in sequencing data". Bioinformatics 27 (20): 2903–2904. doi:10.1093/bioinformatics/btr467. ISSN 1367-4811. PMID 21840877. PMC 3187648. http://dx.doi.org/10.1093/bioinformatics/btr467. 
  89. Chu, Hsueh-Ting; Hsiao, William W. L.; Chen, Jen-Chih; Yeh, Tze-Jung; Tsai, Mong-Hsun; Lin, Han; Liu, Yen-Wenn; Lee, Sheng-An et al. (2013-03-01). "EBARDenovo: highly accurate de novo assembly of RNA-Seq with efficient chimera-detection". Bioinformatics 29 (8): 1004–1010. doi:10.1093/bioinformatics/btt092. ISSN 1367-4811. PMID 23457040. http://dx.doi.org/10.1093/bioinformatics/btt092. 
  90. 90.0 90.1 Haas, Brian J.; Dobin, Alex; Stransky, Nicolas; Li, Bo; Yang, Xiao; Tickle, Timothy; Bankapur, Asma; Ganote, Carrie et al. (2017-03-24). STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq. doi:10.1101/120295. http://dx.doi.org/10.1101/120295. Retrieved 2023-08-30. 
  91. 91.0 91.1 Nicorici, Daniel; Satalan, Mihaela; Edgren, Henrik; Kangaspeska, Sara; Murumagi, Astrid; Kallioniemi, Olli; Virtanen, Sami; Kilkku, Olavi (2014-11-19). FusionCatcher - a tool for finding somatic fusion genes in paired-end RNA-sequencing data. doi:10.1101/011650. http://dx.doi.org/10.1101/011650. Retrieved 2023-08-30. 
  92. 92.0 92.1 Okonechnikov, Konstantin; Imai-Matsushima, Aki; Paul, Lukas; Seitz, Alexander; Meyer, Thomas F.; Garcia-Alcalde, Fernando (2016-12-01). "InFusion: Advancing Discovery of Fusion Genes and Chimeric Transcripts from Deep RNA-Sequencing Data". PLOS ONE 11 (12): e0167417. doi:10.1371/journal.pone.0167417. ISSN 1932-6203. PMID 27907167. Bibcode2016PLoSO..1167417O. 
  93. 93.0 93.1 "MapSplice: accurate mapping of RNA-seq reads for splice junction discovery". Nucleic Acids Research 38 (18): e178. October 2010. doi:10.1093/nar/gkq622. PMID 20802226. 
  94. 94.0 94.1 "SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data". Genome Biology 14 (2): R12. February 2013. doi:10.1186/gb-2013-14-2-r12. PMID 23409703. 
  95. Weber, David; Ibn-Salem, Jonas; Sorn, Patrick; Suchan, Martin; Holtsträter, Christoph; Lahrmann, Urs; Vogler, Isabel; Schmoldt, Kathrin et al. (2022-04-04). "Accurate detection of tumor-specific gene fusions reveals strongly immunogenic personal neo-antigens". Nature Biotechnology 40 (8): 1276–1284. doi:10.1038/s41587-022-01247-9. ISSN 1087-0156. PMID 35379963. PMC 7613288. http://dx.doi.org/10.1038/s41587-022-01247-9. 
  96. Benelli, Matteo; Pescucci, Chiara; Marseglia, Giuseppina; Severgnini, Marco; Torricelli, Francesca; Magi, Alberto (2012-10-23). "Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript". Bioinformatics 28 (24): 3232–3239. doi:10.1093/bioinformatics/bts617. ISSN 1367-4811. PMID 23093608. 
  97. "Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers". Proceedings of the National Academy of Sciences of the United States of America 116 (31): 15524–15533. July 2019. doi:10.1073/pnas.1900391116. PMID 31308241. Bibcode2019PNAS..11615524D. 
  98. McPherson, Andrew; Hormozdiari, Fereydoun; Zayed, Abdalnasser; Giuliany, Ryan; Ha, Gavin; Sun, Mark G. F.; Griffith, Malachi; Heravi Moussavi, Alireza et al. (May 2011). "deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data". PLOS Computational Biology 7 (5): e1001138. doi:10.1371/journal.pcbi.1001138. ISSN 1553-7358. PMID 21625565. Bibcode2011PLSCB...7E1138M. 
  99. "The EGFRvIII transcriptome in glioblastoma: A meta-omics analysis". Neuro-Oncology 24 (3): 429–441. March 2022. doi:10.1093/neuonc/noab231. PMID 34608482. 
  100. Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo (September 2012). "FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery". Nucleic Acids Research 40 (16): e123. doi:10.1093/nar/gks394. ISSN 1362-4962. PMID 22570408. 
  101. Ge, Huanying; Liu, Kejun; Juan, Todd; Fang, Fang; Newman, Matthew; Hoeck, Wolfgang (2011-05-18). "FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution". Bioinformatics 27 (14): 1922–1928. doi:10.1093/bioinformatics/btr310. ISSN 1367-4803. PMID 21593131. http://dx.doi.org/10.1093/bioinformatics/btr310. 
  102. Sboner, Andrea; Habegger, Lukas; Pflueger, Dorothee; Terry, Stephane; Chen, David Z; Rozowsky, Joel S; Tewari, Ashutosh K; Kitabayashi, Naoki et al. (October 2010). "FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data". Genome Biology 11 (10): R104. doi:10.1186/gb-2010-11-10-r104. ISSN 1474-760X. PMID 20964841. 
  103. Davidson, Nadia M; Majewski, Ian J; Oshlack, Alicia (2015-01-12). "JAFFA: High sensitivity transcriptome-focused fusion gene detection.". Genome Medicine 7 (1): 43. doi:10.1186/s13073-015-0167-x. PMID 26019724. 
  104. McPherson, Andrew; Wu, Chunxiao; Wyatt, Alexander W.; Shah, Sohrab; Collins, Colin; Sahinalp, S. Cenk (2012-06-28). "nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing". Genome Research 22 (11): 2250–2261. doi:10.1101/gr.136572.111. ISSN 1088-9051. PMID 22745232. PMC 3483554. http://dx.doi.org/10.1101/gr.136572.111. 
  105. Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F.; Weinstein, John N. et al. (2014-04-01). "PRADA: pipeline for RNA sequencing data analysis". Bioinformatics 30 (15): 2224–2226. doi:10.1093/bioinformatics/btu169. ISSN 1367-4811. PMID 24695405. PMC 4103589. http://dx.doi.org/10.1093/bioinformatics/btu169. 
  106. Wu, Jikun; Zhang, Wenqian; Huang, Songbo; He, Zengquan; Cheng, Yanbing; Wang, Jun; Lam, Tak-Wah; Peng, Zhiyu et al. (2013-10-11). "SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads". Bioinformatics 29 (23): 2971–2978. doi:10.1093/bioinformatics/btt522. ISSN 1367-4811. PMID 24123671. http://dx.doi.org/10.1093/bioinformatics/btt522. 
  107. Kim, Daehwan; Salzberg, Steven L (2011). "TopHat-Fusion: an algorithm for discovery of novel fusion transcripts". Genome Biology 12 (8): R72. doi:10.1186/gb-2011-12-8-r72. ISSN 1465-6906. PMID 21835007. 
  108. Li, Jing-Woei; Wan, Raymond; Yu, Chi-Shing; Co, Ngai Na; Wong, Nathalie; Chan, Ting-Fung (2013-01-12). "ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution". Bioinformatics 29 (5): 649–651. doi:10.1093/bioinformatics/btt011. ISSN 1367-4811. PMID 23314323. PMC 3582262. http://dx.doi.org/10.1093/bioinformatics/btt011. 
  109. "Discovery of functional genomic motifs in viruses with ViReMa-a Virus Recombination Mapper-for analysis of next-generation sequencing data". Nucleic Acids Research 42 (2): e11. January 2014. doi:10.1093/nar/gkt916. PMID 24137010. 
  110. "Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology". Briefings in Bioinformatics 22 (6). November 2021. doi:10.1093/bib/bbab259. PMID 34329375. 
  111. "CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification". Cell Reports 2 (3): 666–673. September 2012. doi:10.1016/j.celrep.2012.08.003. PMID 22939981. 
  112. "Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets". Cell 161 (5): 1202–1214. May 2015. doi:10.1016/j.cell.2015.05.002. PMID 26000488. 
  113. "Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape". Proceedings of the National Academy of Sciences of the United States of America 111 (52): E5643–E5650. December 2014. doi:10.1073/pnas.1408993111. PMID 25512504. Bibcode2014PNAS..111E5643M. 
  114. "Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells". Nature Biotechnology 33 (2): 155–160. February 2015. doi:10.1038/nbt.3102. PMID 25599176. 
  115. "SPHINX--an algorithm for taxonomic binning of metagenomic sequences". Bioinformatics 27 (1): 22–30. January 2011. doi:10.1093/bioinformatics/btq608. PMID 21030462. 
  116. "T cell fate and clonality inference from single-cell transcriptomes". Nature Methods 13 (4): 329–332. April 2016. doi:10.1038/nmeth.3800. PMID 26950746. 
  117. "Linking the T cell receptor to the single cell transcriptome in antigen-specific human T cells". Immunology and Cell Biology 94 (6): 604–611. July 2016. doi:10.1038/icb.2016.16. PMID 26860370. 
  118. "Monocle 3". https://cole-trapnell-lab.github.io/monocle3/. 
  119. "SCANPY: large-scale single-cell gene expression data analysis". Genome Biology 19 (1): 15. February 2018. doi:10.1186/s13059-017-1382-0. PMID 29409532. 
  120. "Scanpy – Single-Cell Analysis in Python — Scanpy 1.8.1 documentation" (in en). readthedocs.io. https://scanpy.readthedocs.io/en/stable/. 
  121. "SCell: integrated analysis of single-cell RNA-seq data". Bioinformatics 32 (14): 2219–2220. July 2016. doi:10.1093/bioinformatics/btw201. PMID 27153637. 
  122. "Integrating single-cell transcriptomic data across different conditions, technologies, and species". Nature Biotechnology 36 (5): 411–420. June 2018. doi:10.1038/nbt.4096. PMID 29608179. 
  123. "Integrated analysis of multimodal single-cell data". Cell 184 (13): 3573–3587.e29. June 2021. doi:10.1016/j.cell.2021.04.048. PMID 34062119. 
  124. "Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq". Bioinformatics 31 (20): 3380–3382. October 2015. doi:10.1093/bioinformatics/btv368. PMID 26099264. 
  125. "SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis". PLOS Computational Biology 11 (11): e1004575. November 2015. doi:10.1371/journal.pcbi.1004575. PMID 26600239. Bibcode2015PLSCB..11E4575G. 
  126. "Classification of low quality cells from single-cell RNA-seq data". Genome Biology 17 (1): 29. February 2016. doi:10.1186/s13059-016-0888-1. PMID 26887813. 
  127. "OEFinder: a user interface to identify and visualize ordering effects in single-cell RNA-seq data". Bioinformatics 32 (9): 1408–1410. May 2016. doi:10.1093/bioinformatics/btw004. PMID 26743507. 
  128. "Quality control of single-cell RNA-seq by SinQC". Bioinformatics 32 (16): 2514–2516. August 2016. doi:10.1093/bioinformatics/btw176. PMID 27153613. 
  129. "A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data". Nature Communications 13 (1): 1901. April 2022. doi:10.1038/s41467-022-29576-y. PMID 35393428. Bibcode2022NatCo..13.1901L. 
  130. "BASiCS: Bayesian Analysis of Single-Cell Sequencing Data". PLOS Computational Biology 11 (6): e1004333. June 2015. doi:10.1371/journal.pcbi.1004333. PMID 26107944. Bibcode2015PLSCB..11E4333V. 
  131. "Normalization and noise reduction for single cell RNA-seq experiments". Bioinformatics 31 (13): 2225–2227. July 2015. doi:10.1093/bioinformatics/btv122. PMID 25717193. 
  132. "ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis". Genome Biology 16 (241): 241. November 2015. doi:10.1186/s13059-015-0805-z. PMID 26527291. 
  133. "Beta-Poisson model for single-cell RNA-seq data analyses". Bioinformatics 32 (14): 2128–2135. July 2016. doi:10.1093/bioinformatics/btw202. PMID 27153638. 
  134. "MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data". Genome Biology 16 (1): 278. December 2015. doi:10.1186/s13059-015-0844-5. PMID 26653891. 
  135. "Bayesian approach to single-cell differential expression analysis". Nature Methods 11 (7): 740–742. July 2014. doi:10.1038/nmeth.2967. PMID 24836921. 
  136. "Bridger: a new framework for de novo transcriptome assembly using RNA-seq data". Genome Biology 16 (1): 30. February 2015. doi:10.1186/s13059-015-0596-2. PMID 25723335. 
  137. "Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications". BMC Medical Genomics 10 (1): 16. March 2017. doi:10.1186/s12920-017-0253-6. PMID 28298217. 
  138. "iSRAP - a one-touch research tool for rapid profiling of small RNA-seq data". Journal of Extracellular Vesicles 4: 29454. 2015. doi:10.3402/jev.v4.29454. PMID 26561006. 
  139. "SPAR: small RNA-seq portal for analysis of sequencing experiments". Nucleic Acids Research 46 (W1): W36–W42. July 2018. doi:10.1093/nar/gky330. PMID 29733404. 
  140. "Improved Placement of Multi-mapping Small RNAs". G3 6 (7): 2103–2111. July 2016. doi:10.1534/g3.116.030452. PMID 27175019. 
  141. "BrowserGenome.org: web-based RNA-seq data analysis and visualization". Nature Methods 12 (11): 1001. November 2015. doi:10.1038/nmeth.3615. PMID 26513548. 
  142. "Using Tablet for visual exploration of second-generation sequencing data". Briefings in Bioinformatics 14 (2): 193–202. March 2013. doi:10.1093/bib/bbs012. PMID 22445902. 
  143. "BRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement". IEEE/ACM Transactions on Computational Biology and Bioinformatics 15 (3): 850–860. 2017. doi:10.1109/TCBB.2017.2688355. PMID 28368827. 
  144. "BRANE Cut: biologically-related a priori network enhancement with graph cuts for gene regulatory network inference". BMC Bioinformatics 16: 368. November 2015. doi:10.1186/s12859-015-0754-2. PMID 26537179. 
  145. "GAGE: generally applicable gene set enrichment for pathway analysis". BMC Bioinformatics 10 (161): 161. May 2009. doi:10.1186/1471-2105-10-161. PMID 19473525. 
  146. "GeneSCF: a real-time based functional enrichment tool with support for multiple organisms". BMC Bioinformatics 17 (1): 365. September 2016. doi:10.1186/s12859-016-1250-z. PMID 27618934. 
  147. "Visualise microarray and RNAseq data using gene ontology annotations. R package version 1.4.1". 2014. https://github.com/kevinrue/GOexpress. 
  148. Young MD; Wakefield MJ; Smyth GK; Oshlack A (2010). "Gene ontology analysis for RNA-seq: accounting for selection bias". Genome Biology 11 (2): R14. doi:10.1186/gb-2010-11-2-r14. PMID 20132535. 
  149. "GSAASeqSP: a toolset for gene set association analysis of RNA-Seq data". Scientific Reports 4 (6347): 6347. September 2014. doi:10.1038/srep06347. PMID 25213199. Bibcode2014NatSR...4E6347X. 
  150. "GSVA: gene set variation analysis for microarray and RNA-seq data". BMC Bioinformatics 14 (17): 7. January 2013. doi:10.1186/1471-2105-14-7. PMID 23323831. 
  151. "Pathway analysis for RNA-Seq data using a score-based approach". Biometrics 72 (1): 165–174. March 2016. doi:10.1111/biom.12372. PMID 26259845. 
  152. "ToPASeq: an R package for topology-based pathway analysis of microarray and RNA-Seq data". BMC Bioinformatics 16 (350): 350. October 2015. doi:10.1186/s12859-015-0763-1. PMID 26514335. 
  153. "TRAPID: an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes". Genome Biology 14 (12): R134. December 2013. doi:10.1186/gb-2013-14-12-r134. PMID 24330842. 
  154. "TRAPID 2.0: a web application for taxonomic and functional analysis of de novo transcriptomes". Nucleic Acids Research 49 (17): e101. September 2021. doi:10.1093/nar/gkab565. PMID 34197621. 
  155. "T-REx: Transcriptome analysis webserver for RNA-seq Expression data". BMC Genomics 16 (663): 663. September 2015. doi:10.1186/s12864-015-1834-4. PMID 26335208. 
  156. "Genozip 14 - advances in compression of BAM and CRAM files". bioRxiv. 14 September 2022. doi:10.1101/2022.09.12.507582. 
  157. "An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex". The Journal of Neuroscience 34 (36): 11929–11947. September 2014. doi:10.1523/JNEUROSCI.1860-14.2014. PMID 25186741. 
  158. "FusionCancer: a database of cancer fusion genes derived from RNA-seq data". Diagnostic Pathology 10 (131): 131. July 2015. doi:10.1186/s13000-015-0310-4. PMID 26215638. 
  159. "PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data". Database 2019. January 2019. doi:10.1093/database/baz046. PMID 30951143. 

External links