Description Target Audience Prerequisites Outline

Course Description

Cancer research has rapidly embraced high throughput technologies and Cloud computing into its research. Large amounts of data are being created from various microarray, tissue array, and next generation sequencing platforms. Dedicated compute clouds such as the Cancer Genome Collaboratory [http://cancercollaboratory.org/] facilitate complex analyses on big cancer data sets from projects hosting their data in the Cloud, such as the ICGC and PCAWG. Now more than ever, having the informatic skills and knowledge of available bioinformatic resources specific to cancer and how to access and use available data sets in the Cloud is critical.

Course Objectives

This 5-day workshop will cover the key bioinformatics concepts and tools required to analyze cancer genomic data sets and access and work with data sets in the Cloud.

Participants will gain practical experience and skills to:

Visualize genomic data;
Analyze cancer –omic data for gene expression, genome rearrangement, somatic mutations, and copy number variation;
Analyze and conduct pathway analysis on the resultant cancer gene list;
Integrate clinical data;
Launch, configure, customize, and scale virtual machines (VM);
Navigate and work with data sets from Cloud repositories; and
Follow best practices in data and workflow management.

Target Audience

Graduates, postgraduates, post-doctoral researchers, bioinformaticians, laboratory technologists, PIs, and core facility researchers whose research involves cancer genomics data. Open to all public health, hospitals, academia, industry, or government affiliations.

Prerequisites

UNIX familiarity is required.

You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact support@bioinformatics.ca for more information.

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1 Lecture: Intro to Cancer Genomics

Overview of what makes cancer genomes unique and their sources of variation
Fundamental cancer genomics approaches and their respective strengths
How cancer genomics analysis can guide patient treatment
Focus on genomics and transcriptomics from bulk data
Bias in cancer datasets
Data security and privacy

Module 2 Lecture: Understanding and Visualizing data

Overview of data formats used in cancer genomics (FASTA, SAM/BAM, BED, etc.)
Overview of commonly used cancer data sources including TCGA, EGA
Visualizing cancer data using IGV and UCSC Genome Browser
Data management best practices

Module 2 Lab Practical: Visualizing Sequencing data

Viewing and navigating sequencing data using IGV
Subsequent modules and lab practicals will use the same IGV Browser tool
Examining single nucleotide polymorphisms and structural changes in IGV

Module 3 Lecture: Genome Alignment

Overview of steps involved in an alignment pipeline
Principles of mapping reads to a reference genome (and which reference genome)
Quality control of alignment data
How cancer complicates the alignment process: tumour content, unmapped reads, coverage

Module 3 Lab Practical: Genome Alignment

Explore cancer genome sequencing raw files
Perform quality control
Align processed reads to genome
Review alignment metrics

Module 4 Lecture: Somatic Alterations

Overview of common alterations found in cancer and their importance, including structural and copy number variations
Overview of single nucleotide polymorphisms (SNP) calling pipeline
Strategies for detection of somatic mutations and factors considered by SNP callers
Binomial mixture models to model allelic counts
Simultaneous analysis of tumor and normal data
Sources of artifacts and false positives

Module 4 Lab Practical: Identifying and Annotating SNPs

Analysis of bulk whole-genome sequencing data for SNPs
Visualization and interpretation of SNP call in IGV
Annotate variant files to determine effects of variation

Module 5: Copy Number Alterations

Importance of copy number alterations in cancer
Methods for detecting copy number alterations
Tools for evaluating CNAs in HT-seq data

Module 5 Lab Practical: Identifying and Annotating CNAs

Using a CNA caller tool for CNA detection
Visualization and interpretation of CNA call in IGV
Annotation of variation

Module 6 Lecture: Transcriptomics

Overview of RNA sequencing methods and their challenges
Outline of a RNA-seq analysis pipeline
Approach to calculating differential gene expression

Module 6 Lab Practical: Gene Expression Analysis

Run a complete differential expression pipeline on bulk cancer data, from obtaining data files to data alignment and gene expression calling
Visualization and interpretation of differential expression calls

Module 7: From Alteration to Gene Effect

Significance of understanding somatic alterations and their effect on genes, proteins, pathways
Effect interpretation databases like COSMIC, dbSNP, CIViC
Tools for predicting the effect of alterations on genes: SnpEff, Ensembl VEP
Limitations to annotation effect interpretations

Module 7 Lab Practical: