Informatics for RNA-seq Analysis

Course Objectives
High-throughput sequencing of RNA libraries (RNA-seq) has become increasingly common and largely supplanted gene microarrays for transcriptome profiling. When processed appropriately, RNA-seq data has the potential to provide a considerably more detailed view of the transcriptome. The CBW has developed a 3-day course providing an introduction to RNA-seq data analysis followed by integrated tutorials demonstrating the use of popular RNA-seq analysis packages. The tutorials are designed as self-contained units that include example data (Illumina paired-end RNA-seq data) and detailed instructions for installation of all required bioinformatics tools (HISAT, StringTie, etc.).
Participants will gain practical experience and skills to be able to:
- Perform command-line Linux based analysis on the cloud
- Assess quality of RNA-seq data
- Align RNA-seq data to a reference genome
- Estimate known gene and transcript expression
- Perform differential expression analysis
- Discover novel isoforms
- Visualize and summarize the output of RNA-seq analyses in R
- Assemble transcripts from RNA-Seq data.
Target Audience
Graduates, postgraduates, and PIs working or about to embark on an analysis of RNA-seq data. Attendees may be familiar with some aspect of RNA-seq analysis (e.g. gene expression analysis) or have no direct experience.
Prerequisites: Basic familiarity with Linux environment and S, R, or Matlab. Must be able to complete and understand the following simple Linux and R tutorials (up to and including “Descriptive Statistics”) before attending:
- UNIX Tutorial (up to and including Tutorial Four) [http://www.ee.surrey.ac.uk/Teaching/Unix/]
- Quick & Dirty Guide to R [http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/quick&dirty_R.txt]
You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact course_info@bioinformatics.ca for more information.
Pre-work and pre-readings can be found at https://bioinformaticsdotca.github.io/rnaseq_2018.
Course Material
-
Module 1: Introduction to Cloud Computing (Obi Griffith)
Instructor(s): Obi Griffith
Content:
- Introduction to cloud computing concepts
Presentation file(s):
PDFLab Practical
Instructors(s): Obi Griffith
Content:
- Learn to configure, launch, and connect to an Amazon cloud instance.
Presentation file(s):
PDF -
Module 2: Introduction to RNA sequencing and analysis (Malachi Griffith)
Instructor(s): Malachi Griffith
Content:
- Basic introduction to biology of RNA-seq
- Experimental design and analysis considerations
- Commonly asked questions
Presentation file(s):
PDFLab Practical
-
Module 3: RNA-Seq alignment and visualization (Fouad Yousif)
Instructor(s): Fouad Yousif
Content:
- RNA-seq alignment challenges and common questions
- Alignment strategies
- Introduction to HISAT2
- Introduction to the BAM and BED formats
- Basic manipulation of BAMs with samtools, Picard, etc.
- Visualization of RNA-seq alignments - IGV
- Alignment QC Assessment
- BAM read counting and determination of variant allele expression status
Presentation file(s):
PDFLab Practical
Instructors(s): Fouad Yousif
Content:
- Run HISAT2 with parameters suitable for gene expression analysis
- Use samtools to explore and manipulate the features of the SAM/BAM files
- Use IGV to visualize HISAT2 alignments, view a variant position, load exon junctions files, etc.
- Determine BAM-read counts at a variant position
- Use samtools flagstat, samstat, FastQC to assess quality of alignments
-
Module 4: Expression and differential expression (Obi Griffith)
Instructor(s): Obi Griffith
Content:
- Expression estimation for known genes and transcripts
- FPKM/TPM expression estimates vs. raw counts
- Differential expression methods
- Downstream interpretation of expression and differential expression estimates
Presentation file(s):
PDFLab Practical
Instructors(s): Obi Griffith
Content:
- Generate gene/transcript expression estimates with StringTie
- Perform differential expression analysis with Ballgown
- Summarize and visualize differential expression results
-
Module 5: Reference free alignment (Malachi Griffith)
Instructor(s): Malachi Griffith
Content:
- Explore the use of Kallisto to get abundance estimates without first aligning to a reference.
Presentation file(s):
PDF -
Module 6: Genome Guided and Genome-Free Transcriptome Assembly (Brian Haas)
Instructor(s): Brian Haas
Content:
- Explore use of StringTie in reference annotation based transcript (RABT) assembly mode and de novo assembly mode. Both modes require a reference genome sequence.
- Reconstructing transcripts using Trinity
- Genome-free transcript quantification and differential expression analysis
Lab Practical
Instructors(s): Brian Haas
Content:
- Assemble RNA-Seq transcripts
- Explore use of StringTie in reference annotation based transcript (RABT) assembly mode and de novo assembly mode. Both modes require a reference genome sequence.
Presentation file(s):
PDF -
Module 7: Functional Annotation and Analysis of Transcripts (Brian Haas)
Instructor(s): Brian Haas
Content:
- Predict coding regions of transcripts
- Using Trinotate to capture evidence for transcript function
Presentation file(s):
PDFLab Practical
Instructors(s): Brian Haas
Content:
- Explore TrinotateWeb for navigating transcript annotation and expression data
Presentation file(s):
PDF