Description Target Audience Prerequisites Outline

Course Description

Cancer research has rapidly embraced high throughput technologies and Cloud computing into its research. Large amounts of data are being created from various microarray, tissue array, and next generation sequencing platforms. Dedicated compute clouds such as the Cancer Genome Collaboratory http://cancercollaboratory.org/ facilitate complex analyses on big cancer data sets from projects hosting their data in the Cloud, such as the ICGC and PCAWG. Now more than ever, having the informatic skills and knowledge of available bioinformatic resources specific to cancer and how to access and use available data sets in the Cloud is critical.

This 5-day workshop will cover the key bioinformatics concepts and tools required to analyze cancer genomic data sets and access and work with data sets in the Cloud.

Course Objectives

Participants will gain practical experience and skills to:

Visualize genomic data
Analyze cancer –omic data for gene expression, genome rearrangement, somatic mutations, and copy number variation
Analyze and conduct pathway analysis on the resultant cancer gene list
Integrate clinical data
Launch, configure, customize, and scale virtual machines (VM)
Navigate and work with data sets from Cloud repositories
Follow best practices in data and workflow management

Target Audience

This workshop is intended for clinical researchers, researcher scientists, post-doctoral fellows, and graduate students with cancer genomics research projects.

Prerequisites

UNIX and R familiarity is required.

You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact support@bioinformatics.ca for other possible options.

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Introduction to Cancer Genomics (Instructor: Trevor Pugh)

Overview of cancer genomics field
Common applications of HT technologies in cancer genomics
Concepts and case studies of cancer genomics from the literature: Cancer genetics, Pharmacogenomics, Diagnostic vs. prognostic markers and druggable targets

Module 2: Ethics of Data Usage and Security (Instructor: Mark Phillips)

Introduction to Cloud computing and virtual machines (VMs)
Ethical conduct when using genomic data
Security of big data sets in the Cloud
VM security and usage best practices: ssh keys, ports, snapshot of VM without privileged data, and shut down of VM when not in use.

Module 3: Databases and Visualization Tools (Francis Ouellette)

Overview of cancer specific databases, as well genome browsing and cancer genome browsing.
The databases: Collaboratory, ICGC portal, TCGA, COSMIC, dbSNP, etc.
The browser tools: IGV, UCSC

Lab Practical

How to use the genome browsers to visualize transcripts, mutations, and other cancer genome features. Subsequent modules and lab practicals will use the same browser tools.

Module 4: Genome Alignment (Jared Simpson)

What is involved in mapping reads to a reference genome
What are the FASTQ and SAM/BAM file formats
Some common terminology used to describe alignments

Lab Practical

Run a read alignment

Module 5: Genome Assembly (Jared Simpson)

Fundamentals of de novo assembly
Data structures used by assemblers (de Bruijn graphs and overlap graphs)
Common steps that assemblers perform
Overview of commonly used software

Lab Practical

Run a read assembly.

Module 6: Somatic Copy Number Changes (Sorana Morrissy)

Importance of copy number alterations in cancer
Methods for detecting copy number alterations
Tools for evaluating CNAs in HT-seq data

Lab Practical

Hands-on lab exercises using CNA caller tool for CNA detection in SNP arrays.

Module 7: Somatic Mutations and Annotations (Sorana Morrissy)

Relevance of detecting somatic mutations in cancer genomics
Strategies for detection of somatic mutations and factors considered by SNP callers
Binomial mixture models to model allelic counts
Simultaneous analysis of tumor and normal data
Sources of artifacts and false positives

Lab Practical

Hands on lab exercises using SNP calling tools for somatic mutation detection and visualization in IGV.

Module 8: Gene Expression Profiling (Florence Cavalli)

The Technology Platform: high-throughput sequencing
Variety of platforms and their differences
Experimental design considerations
Limitations of experiments
The Analysis Tools: Outline of a RNA-Seq analysis pipeline, Tools for analysis of RNA-Seq data

Lab Practical

Analysis of RNA-Seq data.

Module 9: Gene Fusion Discovery and Genomic Rearrangements (Brian Haas)

Genomic rearrangement and its effect on the transcriptome
Biological relevance of gene fusions in cancer biology
The technology platform: RNA-Seq
Overview of the RNA-Seq protocol
Experimental design considerations
The analysis tools: Alignment based fusion discovery, Paired end RNA-Seq alignments and gene fusion evidence, Discerning true fusions from artifacts, Assembly based fusion discovery
Benefits/drawbacks of assembly methods
Identifying gene fusions from RNA-Seq assemblies
Clinical applications

Lab Practical

Hands on exercise using deFuse to identify fusions. Visualize RNA-Seq fusion reads in IGV, plot exon expression of fused genes, and use the UCSC genome browser to view fusions in tandem with other genomic features.

Module 10: Genes to Pathways (Instructor: Jüri Reimand)

Introduction to gene lists
Gene annotations: Gene identifiers and pathway databases
What is pathway enrichment analysis?

Lab Practical

During this hands-on session, students will have the opportunity to play with gene identifiers, and to create a network from gene-set enrichment results using Cytoscape.