Description Target Audience Prerequisites Outline

Course Description

With the introduction of high-throughput sequencing platforms, it is becoming feasible to consider sequencing approaches to address many research projects. However, knowing how to manage and interpret the large volume of sequence data resulting from such technologies is less clear. The CBW has developed a popular 2-day course covering the bioinformatics tools available for managing and interpreting high-throughput sequencing data, where the focus is on Illumina reads although the information is applicable to all sequencer reads.

Course Objectives

Beginning with an understanding of the workflow involved to move from platform images to sequence generation, participants will gain practical experience and skills to be able to:

Assess sequence quality
Map sequence data onto a reference genome
Perform de novo assembly tasks
Quantify sequence data
Integrate biological context with sequence information

Target Audience

This workshop is intended for graduate students, post-doctoral fellows, clinical fellows and investigators involved in analyzing data from high-throughput sequencing platforms.

Prerequisites

UNIX familiarity is required.

You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact support@bioinformatics.ca for other possible options.

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Introduction to High Throughput Sequencing (Jared Simpson)

Overview of high-throughput sequencing technologies: major players and their strengths and weaknesses

Module 2: Genome visualization (Hamza Farooq)

Data file formats used in genome visualization (FASTA, BED, WIG, GFF, etc)
Introduction to genomic data visualization tools and how they can be used to visualize sequencing read data: UCSC, IGV, Savant, GBrowse
Integrating other data sets into a browser

Lab Practical

Variant detection and visualization within the genome using IGV

Module 3: Genome Alignment (Mathieu Bourgey)

What is involved in mapping reads to a reference genome
What are the FASTQ and SAM/BAM file formats
Some common terminology used to describe alignments

Lab Practical

Genome alignment exercise

Module 4: Small-Variant Calling and Annotation (Mathieu Bourgey)

SNPs, SNVs, and short-INDELs and why to look for them
BQ recalibration, duplicate removal, aligner choice
Detecting variants and factors taken into account by the SNP callers
Different types of SNP calling: haploid/diploid, trio, somatic mutations, pooled
Determining which SNPS are good from the millions detected
INDEL cleaning
Standard file formats for SNPs
Introduction to SNP calling tools and how they compare with each other

Lab Practical

SNP detection exercise

Module 5: Structural Variation (Mathieu Bourgey)

Structural variants (SVs), different types, mechanisms that give rise to SVs, and how SVs and CNVs differ
Differences between human and model organism genomes
Detecting SVs via sequencing (read pair, read depth, combined approach, local de novo assembly) and which SV types are detectable by which strategies
Introduction to SV detection tools
File formats used to describe SVs

Lab Practical

SV discovery in a single human genome
Brief intro to SV visualization and interpretation

Module 6: De Novo Assembly (Jared Simpson)

Fundamentals of de novo assembly
Data structures used by assembles (de Bruijn graphs and overlap graphs)
Common steps that assemblers perform
Overview of commonly used software

Lab Practical

Perform a de novo assembly task

Course material available here

Workshop Details:

Duration: 2 days

Start: Jun 20, 2019

End: Jun 21, 2019

Location: Montreal, Quebec Canada

Course Mode:
Mode Filter

Status: Registration Closed

Workshop Ended

Offers:

for applications received between to

Limited to: 30 participants

Lead Instructors:

Jared Simpson

Mathieu Bourgey

Open Access Content:

Canadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.

Funders

Posted on:

April 21, 2022

(2019) Informatics on High Throughput Sequencing Data

Module 1: Introduction to High Throughput Sequencing (Jared Simpson)

Module 2: Genome visualization (Hamza Farooq)

Lab Practical

Module 3: Genome Alignment (Mathieu Bourgey)

Lab Practical

Module 4: Small-Variant Calling and Annotation (Mathieu Bourgey)

Lab Practical

Module 5: Structural Variation (Mathieu Bourgey)

Lab Practical

Module 6: De Novo Assembly (Jared Simpson)

Lab Practical