Informatics on High Throughput Sequencing Data

Course Objectives
With the introduction of high-throughput sequencing platforms, it is becoming feasible to consider sequencing approaches to address many research projects. However, knowing how to manage and interpret the large volume of sequence data resulting from such technologies is less clear. The CBW has developed a popular 2-day course covering the bioinformatics tools available for managing and interpreting high-throughput sequencing data, where the focus is on Illumina reads although the information is applicable to all sequencer reads.
Beginning with an understanding of the workflow involved to move from platform images to sequence generation, participants will gain practical experience and skills to be able to:
- Assess sequence quality
- Map sequence data onto a reference genome
- Perform de novo assembly tasks
- Quantify sequence data
- Integrate biological context with sequence information
Target Audience
This workshop is intended for graduate students, post-doctoral fellows, clinical fellows and investigators involved in analyzing data from HT sequencing platforms.
Prerequisites: UNIX familiarity is required. Familiarity can be gained through online activities. You should be familiar with these UNIX concepts (tutorial 1-3) [http://www.ee.surrey.ac.uk/Teaching/Unix/].
You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact course_info@bioinformatics.ca for more information.
Pre-work and pre-readings can be found at https://bioinformaticsdotca.github.io/htseq_2018.
Course Material
-
Module 1: Introduction to High Throughput Sequencing (Jared Simpson)
Instructor(s): Jared Simpson
Content:
- Overview of high-throughput sequencing technologies: major players and their strengths and weaknesses
-
Module 2: Genome visualization (Hamza Farooq)
Instructor(s): Hamza Farooq
Content:
- Data file formats used in genome visualization (FASTA, BED, WIG, GFF, etc)
- Introduction to genomic data visualization tools and how they can be used to visualize sequencing read data: UCSC, IGV, Savant, GBrowse
- Integrating other data sets into a browser
Lab Practical
Instructors(s): Hamza Farooq
Content:
- Variant detection and visualization within the genome using IGV
Presentation file(s):
PDF -
Module 3: Genome Alignment (Mathieu Bourgey)
Instructor(s): Mathieu Bourgey
Content:
- What is involved in mapping reads to a reference genome
- What are the FASTQ and SAM/BAM file formats
- Some common terminology used to describe alignments
Lab Practical
-
Module 4: Small-Variant Calling and Annotation (Mathieu Bourgey)
Instructor(s): Mathieu Bourgey
Content:
- SNPs, SNVs, and short-INDELs and why to look for them
- BQ recalibration, duplicate removal, aligner choice
- Detecting variants and factors taken into account by the SNP callers
- Different types of SNP calling: haploid/diploid, trio, somatic mutations, pooled
- Determining which SNPS are good from the millions detected
- INDEL cleaning
- Standard file formats for SNPs
- Introduction to SNP calling tools and how they compare with each other
Lab Practical
-
Module 5: Structural Variation (Mathieu Bourgey)
Instructor(s): Mathieu Bourgey
Content:
- Structural variants (SVs), different types, mechanisms that give rise to SVs, and how SVs and CNVs differ
- Differences between human and model organism genomes
- Detecting SVs via sequencing (read pair, read depth, combined approach, local de novo assembly) and which SV types are detectable by which strategies
- Introduction to SV detection tools
- File formats used to describe SVs
Lab Practical
Instructors(s): Mathieu Bourgey
Content:
- SV discovery in a single human genome
- Brief intro to SV visualization and interpretation
Presentation file(s):
PDF -
Module 6: De Novo Assembly (Jared Simpson)
Instructor(s): Jared Simpson
Content:
- Fundamentals of de novo assembly
- Data structures used by assembles (de Bruijn graphs and overlap graphs)
- Common steps that assemblers perform
- Overview of commonly used software
Lab Practical