Informatics for RNA-seq Analysis
High-throughput sequencing of RNA libraries (RNA-seq) has become increasingly common and largely supplanted gene microarrays for transcriptome profiling. When processed appropriately, RNA-seq data has the potential to provide a considerably more detailed view of the transcriptome. The CBW has developed a 3-day course providing an introduction to RNA-seq data analysis followed by integrated tutorials demonstrating the use of popular RNA-seq analysis packages. The tutorials are designed as self-contained units that include example data (Illumina paired-end RNA-seq data) and detailed instructions for installation of all required bioinformatics tools (HISAT, StringTie, etc.).
Participants will gain practical experience and skills to be able to:
- Perform command-line Linux based analysis on the cloud
- Assess quality of RNA-seq data
- Align RNA-seq data to a reference genome
- Estimate known gene and transcript expression
- Perform differential expression analysis
- Discover novel isoforms
- Visualize and summarize the output of RNA-seq analyses in R
- Assemble transcripts from RNA-Seq data.
Graduates, postgraduates, and PIs working or about to embark on an analysis of RNA-seq data. Attendees may be familiar with some aspect of RNA-seq analysis (e.g. gene expression analysis) or have no direct experience.
Basic familiarity with Linux environment and S, R, or Matlab.
You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact firstname.lastname@example.org for other possible options.
This workshop requires participants to complete pre-workshop tasks and readings.
Module 1: Introduction to Cloud Computing
Module 2: Introduction to RNA sequencing and analysis
- Basic introduction to biology of RNA-seq
- Experimental design and analysis considerations
- Commonly asked questions
Module 3: RNA-Seq alignment and visualization
- RNA-seq alignment challenges and common questions
- Alignment strategies
- Introduction to HISAT2
- Introduction to the BAM and BED formats
- Basic manipulation of BAMs with samtools, Picard, etc.
- Visualization of RNA-seq alignments - IGV
- Alignment QC Assessment
- BAM read counting and determination of variant allele expression status
- Run HISAT2 with parameters suitable for gene expression analysis
- Use samtools to explore and manipulate the features of the SAM/BAM files
- Use IGV to visualize HISAT2 alignments, view a variant position, load exon junctions files, etc.
- Determine BAM-read counts at a variant position
- Use samtools flagstat, samstat, FastQC to assess quality of alignments
Module 4: Expression and differential expression
- Expression estimation for known genes and transcripts
- FPKM/TPM expression estimates vs. raw counts
- Differential expression methods
- Downstream interpretation of expression and differential expression estimates
Module 5: Reference free alignment
Module 6: Isoform discovery and alternative expression
- Explore use of StringTie in reference annotation based transcript (RABT) assembly mode and de novo assembly mode. Both modes require a reference genome sequence.
Module 7: Genome-Free De Novo Transcript Assembly
- Reconstructing transcripts using Trinity
- Genome-free transcript quantification and differential expression analysis
Module 8: Functional Annotation and Analysis of Transcripts