Informatics for RNA-Seq Analysis (2017)

Course Objectives

A poster announcing this workshop can be found here

High-throughput sequencing of RNA libraries (RNA-seq) has become increasingly common and largely supplanted gene microarrays for transcriptome profiling. When processed appropriately, RNA-seq data has the potential to provide a considerably more detailed view of the transcriptome. The CBW has developed a 3-day course providing an introduction to RNA-seq data analysis followed by integrated tutorials demonstrating the use of popular RNA-seq analysis packages. The tutorials are designed as self-contained units that include example data (Illumina paired-end RNA-seq data) and detailed instructions for installation of all required bioinformatics tools (HISAT, StringTie, etc.). De novo assembly of transcripts will be covered.

Participants will gain practical experience and skills to be able to:

  • Perform command-line Linux based analysis on the cloud
  • Assess quality of RNA-seq data
  • Align RNA-seq data to a reference genome
  • Estimate known gene and transcript expression
  • Perform differential expression analysis
  • Discover novel isoforms
  • Assemble RNA-seq data into transcripts
  • Visualize and summarize the output of RNA-seq analyses in R

Target Audience

Graduates, postgraduates and PIs working with or about to embark on an analysis of RNA-seq data. Attendees may be familiar with some aspect of RNA-seq analysis (e.g. gene expression analysis) or have no direct experience.

Prerequisites for attendance:

Basic familiarity with Linux environment and S, R, or Matlab. Must be able to complete and understand the following simple Linux and R tutorials (up to and including "Descriptive Statistics") before attending:

You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 1GB RAM, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact for more information.

Course Outline

Day 1

Module 1: Introduction to Cloud Computing (2017) (Instructor: Obi Griffith)

  • Introduction to cloud computing concepts

Lab practical:

  • Learn to configure, launch, and connect to an Amazon cloud instance.

Module 2: Introduction to RNA Sequencing and Analysis (2017) (Instructor: Malachi Griffith)

  • Basic introduction to biology of RNA-seq
  • Experimental design and analysis considerations
  • Commonly asked questions

Lab Practical:

  • Introduction to the test data
  • Examine and understand the format of raw FastQ files
  • Obtain reference genomes (fasta) and gene annotation resources (GTF/GFF)
  • Perform pre-alignment QC

Module 3: RNA-Seq Alignment and Visualization (2017) (Instructor: Fouad Yousif)

  • Use of HISAT2
  • Introduction to the BAM format
  • Basic manipulation of BAMs with samtools, Picard, etc.
  • Visualization of RNA-seq alignments - IGV
  • BAM read counting and determination of variant allele expression status

Lab Practical:

  • Run HISAT2 with parameters suitable for gene expression analysis
  • Use samtools to explore the features of the SAM/BAM format and perform basic manipulation of these alignment files (view, sort, index, manipulate headers, extract data, etc.)
  • Use IGV to visualize HISAT2 alignments, view a variant position, load exon junctions files, etc.

Integrated assignment:

  • Using a subset of data, assess the prostate cancer specific expression of the PCA3 gene.

Day 2

Module 4: Expression and Differential Expression (2017) (Instructor: Obi Griffith)

  • Get FPKM style expression estimates using StringTie
  • Perform differential expression analysis with BallGown
  • Perform summary analysis with BallGown and custom R code
  • Downstream interpretation of expression analysis (multiple testing, clustering, heatmaps, classification, pathway analysis, etc.) will also be discussed.

Lab Practical:

  • Run StringTie and BallGown
  • Explore the output of these in R

Module 5: Reference Free Alignment (2017) (Instructor: Malachi Griffith)

  • Explore the use of Kallisto to get abundance estimates without first aligning to a reference.

Module 6: Isoform Discovery and Alternative Expression (2017) (Instructor: Malachi Griffith)

  • Explore use of StringTie in reference annotation based transcript (RABT) assembly mode and de novo assembly mode. Both modes require a reference genome sequence.

Lab Practical:

  • Run StringTie in alternate modes more conducive to isoform discovery and explore the results

Day 3

Module 7: Genome-Free De Novo Transcript Assembly (2017) (Instructor: Brian Haas)

  • Reconstructing transcripts using Trinity
  • Genome-free transcript quantification and differential expression analysis

Lab Practical

  • Assemble RNA-Seq transcripts

Module 8: Functional Annotation and Analysis of Transcripts (2017) (Instructor: Brian Haas)

  • Predict coding regions of transcripts
  • Using Trinotate to capture evidence for transcript function

Lab Practical

  • Explore TrinotateWeb for navigating transcript annotation and expression data

Open Access LogoCanadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.