Informatics on High-throughput Sequencing Data (2016)

Course Objectives

A poster announcing this workshop can be found here

With the introduction of high-throughput sequencing platforms, it is becoming feasible to consider sequencing approaches to address many research projects. However, knowing how to manage and interpret the large volume of sequence data resulting from such technologies is less clear. The CBW has developed a popular 2-day course covering the bioinformatics tools available for managing and interpreting high-throughput sequencing data, where the focus is on Illumina reads although the information is applicable to all sequencer reads.

Beginning with an understanding of the workflow involved to move from platform images to sequence generation, participants will gain practical experience and skills to be able to:

  • Assess sequence quality
  • Map sequence data onto a reference genome (required)
  • Quantify sequence data
  • Integrate biological context with sequence information

Target Audience

This workshop is intended for graduate students, post-doctoral fellows, clinical fellows and investigators involved in analyzing data from HT sequencing platforms.

Prerequisite: UNIX familiarity is required. Familiarity can be gained through online activities. You should be familiar with these UNIX concepts (tutorial 1-3).

You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 1GB RAM, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact for more information.

Application Information

Applications for this workshop will be accepted until May 9, 2016. Applications received prior to this closing date qualify for the early registration fee. Applications received after the closing date will only be accepted pending space availability and will be subject to the late registration fee.

Students accepted into the workshop will be notified of their acceptance after May 9, 2016. This notification will include payment instructions.

CBW welcomes applications from all interested participants regardless of geographical location. Non-Canadians, however, are not eligible for registration or travel awards.

Course Outline

Day 1

Module 1 - Introduction to HT-seq and Cloud Computing (2016) (Faculty: Francis Ouellette)

  • Overview of high-throughput sequencing technologies: major players and their strengths and weaknesses
  • Introduction to cloud computing concepts

Lab practical: Learn to configure, launch and connect to an Amazon cloud instance

Module 2 - Genome Alignment (2016) (Faculty: Mathieu Bourgey)

  • What is involved in mapping reads to a reference genome
  • What are the FASTQ and SAM/BAM file formats
  • Some common terminology used to describe alignments

Lab Practical: Genome alignment exercise

Module 3 - Genome Visualization (2016) (Faculty: Sorana Morrissy)

  • Other data file formats used in genome visualization (FASTA, BED, WIG, GFF, etc)
  • Introduction to genomic data visualization tools and how they can be used to visualize sequencing read data: UCSC, IGV, Savant, GBrowse
  • Integrating other data sets into a browser

Lab Practical: Variant detection and visualization within the genome using Savant

Module 4 - De Novo Assembly (2016) (Faculty: Jared Simpson)

  • Fundamentals of de novo assembly
  • Data structures used by assembles (de Bruijn graphs and overlap graphs)
  • Common steps that assemblers perform
  • Overview of commonly used software

Day 2

Module 5 - Genome Variation (2016) (Faculty: Guillaume Bourque)

  • What are SNPs, SNVs, and short-INDELs? Why would I want to look for them?
  • What should I have done up to this point? (e.g. BQ recalibration, duplicate removal, aligner choice)
  • How are these variants detected? What factors are taken into account by the SNP callers?
  • Different types of SNP calling: haploid/diploid, trio, somatic mutations, pooled
  • YAY, WE FOUND MILLIONS OF SNPs!!!! How do I know if any of these are good?
  • INDEL cleaning
  • Are there any standard file formats for SNPs?
  • Introduction to SNP calling tools and how they compare with each other

Lab Practical: SNP detection exercise

Module 6 - Genome Structural Variation (2016) (Faculty: Guillaume Bourque)

  • What are structural variants (SVs)? What are the different types? What mechanisms give rise to SVs? How are SVs and CNVs different?
  • What differences exist between human and model organism genomes?
  • How can we detect SVs via sequencing? Discuss detection strategies (read pair, read depth, combined approach, local de novo assembly). Which SV types are detectable by which strategies?
  • Introduction to SV detection tools
  • What file formats are used to describe SVs?

Lab Practical:

  • SV discovery in a single human genome
  • What should I have done up to this point?
  • Brief intro to SV visualization and interpretation

Module 7 - Bringing it Together with Galaxy (2016) (Faculty: Francis Ouellette)

  • Galaxy - A pipeline tool for high-throughput sequence data analysis

Lab Practical: Galaxy analysis of HT-seq data repeating Day 1

Open Access LogoCanadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.