Bioinformatics for Cancer Genomics (2017)

Course Objectives

A poster announcing this workshop can be found here

Cancer research has rapidly embraced high throughput technologies into its research, using various microarray, tissue array, and next generation sequencing platforms. The result has been a rapid increase in cancer data output and data types. Now more than ever, having the informatic skills and knowledge of available bioinformatic resources specific to cancer is critical.

The CBW has developed a 5-day workshop covering the key bioinformatics concepts and tools required to analyze cancer genomic data sets.

Participants will gain practical experience and skills to:

  • Visualize genomic data;
  • Analyze cancer –omic data for gene expression, genome rearrangement, somatic mutations, and copy number variation;
  • Analyze and conduct pathway analysis on the resultant cancer gene list;
  • Integrate clinical data.

Target Audience

This workshop is developed for clinical researchers, research scientists, post-doctoral fellows, and graduate students with cancer genomics research projects.

Prerequisite: UNIX and R familiarity is required. Familiarity can be gained through online activities. You should be familiar with these UNIX concepts (tutorial 1-3) and these R concepts (chapters 1-5) or review the past Statistics tutorials provided by CBW. A useful hands-on tool for getting started in R is Swirl

You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 1GB RAM, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact course_info@bioinformatics.ca for more information.

Course Outline

Day 1

Module 1: Introduction to Cancer Genomics (2017) (Instructor: Trevor Pugh)

  • Overview of cancer genomics field
  • Common applications of HT technologies in cancer genomics
  • Concepts and case studies of cancer genomics from the literature:
    • Cancer genetics
    • Pharmacogenomics
    • Diagnostic vs. prognostic markers and druggable targets
  • Data security and privacy

Module 2: Databases and Visualization Tools (2017) (Instructor: Francis Ouellette)

Overview of cancer specific databases, as well genome browsing and cancer genome browsing.
The databases:

  • ICGC portal, TCAG, etc.
  • COSMIC, dbSNP, etc.

The browser tools:

  • IGV
  • UCSC

Lab Practical: How to use the genome browsers to visualize transcripts, mutations, and other cancer genome features. Subsequent modules and lab practicals will use the same browser tools.

Logging into the Cloud

UNIX Review until 8pm

Day 2

Module 3: Genome Alignment and Assembly (2017) (Instructor: Jared Simpson)

  • What is involved in mapping reads to a reference genome
  • What are the FASTQ and SAM/BAM file formats
  • Some common terminology used to describe alignments
  • Fundamentals of de novo assembly
  • Data structures used by assemblers (de Bruijn graphs and overlap graphs)
  • Common steps that assemblers perform
  • Overview of commonly used software

Lab Practical:

  • Genome alignment exercise

Module 4: Copy Number Alterations (2017) (Instructor: Fong Chun Chan)

  • Importance of copy number alterations in cancer
  • Methods for detecting copy number alterations
  • Tools for evaluating CNAs in HT-seq data

Lab Practical: Hands-on lab exercises using CNA caller tool for CNA detection in SNP arrays.

Day 3

Module 5: Somatic Mutations and Annotations (2017) (Instructor: Fong Chun Chan)

  • Relevance of detecting somatic mutations in cancer genomics
  • Strategies for detection of somatic mutations and factors considered by SNV callers
  • Binomial mixture models to model allelic counts
  • Simultaneous analysis of tumor and normal data
  • Sources of artifacts and false positives

Lab Practical: Hands on lab exercises using SNV calling tools for somatic mutation detection

Module 6: Gene Expression Profiling (2017) (Instructor: Fouad Yousif)

  • Role of gene expression profiles in the cancer continuum
  • The Technology Platform: High-throughput sequencing
    • Variety of platforms and their differences
    • Experimental design considerations
    • Limitations of experiments
  • The Analysis Tools:
    • Outline of a RNA-Seq analysis pipeline
    • Tools for analysis of RNA-Seq data

Lab Practical: Analysis of RNA-Seq data.

Integrated Assignment: Bringing it all together with Galaxy.

Day 4

Module 7: Gene Fusions and Rearrangements (2017) (Instructor: Andrew McPherson)

  • Genomic rearrangement and its effect on the transcriptome
  • Biological relevance of gene fusions in cancer biology
  • The technology platform: RNA-Seq
    • Overview of the RNA-Seq protocol
    • Experimental design considerations
  • The analysis tools:
    • Alignment based fusion discovery
      • Paired end RNA-Seq alignments and gene fusion evidence
      • Discerning true fusions from artifacts
    • Assembly based fusion discovery
      • Benefits/drawbacks of assembly methods
      • Identifying gene fusions from RNA-Seq assemblies
  • Clinical applications

Lab practical:
Hands on exercise using deFuse to identify fusions. Visualize RNA-Seq fusion reads in IGV, plot exon expression of fused genes, and use the ucsc genome browser view fusions in tandem with other genomic features.

Module 8a: Variants to Networks (2017) (Instructor: Jüri Reimand)

Part 1: How to annotate variants and prioritize potentially relevant ones

  • Reference databases: germline variant frequency databases
  • Gene mapping
  • Gene product effect type
  • Conservation
  • Missense effect scoring (SIFT, PolyPhen2, Mutation Assessor)

Lab practical:
Hands on session using Annovar to annotate somatic variants.

Part 2: From genes to pathways

  • Introduction to gene lists
  • Gene annotations: Gene identifiers and pathway databases
  • What is pathway enrichment analysis?

Lab Practical:
During this hands on session, students will have the opportunity to play with gene identifiers, and to create a network from gene-set enrichment results using Cytoscape.

Day 5

Module 8b: Variants to Networks (2017) (Instructor: Lincoln SteinInstructor: Robin Haw)

Part 3: Network Analysis using Reactome

  • Overview of pathway and network analyis
  • Basic network concepts
  • Types of pathway and network information
    • Focus on transcription factor regulatory networks. Pathway Databases: KEGG, Reactome

Lab Practical: Reactome (Robin Haw)

  • Workflow of tools and steps
  • Reactome FI

Module 9: Clinical Data Integration (2017) (Instructor: Anna Goldenberg)

  • Introduction to correlating clinical outcomes with genomic data
  • How do variants discovered in genomic data result in clinical outcomes?
  • Challenges with integration of heterogeneous data types (clinical vs. genomics)
  • Survival analysis (univariate and multivariate)

Lab Practical: Analysis of clinical cancer data using R

Open Access LogoCanadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.