Cancer research has rapidly embraced high throughput technologies and Cloud computing into its research. Large amounts of data are being created from various microarray, tissue array, and next generation sequencing platforms. Dedicated compute clouds such as the Cancer Genome Collaboratory http://cancercollaboratory.org/ facilitate complex analyses on big cancer data sets from projects hosting their data in the Cloud, such as the ICGC and PCAWG. Now more than ever, having the informatic skills and knowledge of available bioinformatic resources specific to cancer and how to access and use available data sets in the Cloud is critical.
This 5-day workshop will cover the key bioinformatics concepts and tools required to analyze cancer genomic data sets and access and work with data sets in the Cloud.
Participants will gain practical experience and skills to:
- Visualize genomic data
- Analyze cancer –omic data for gene expression, genome rearrangement, somatic mutations, and copy number variation
- Analyze and conduct pathway analysis on the resultant cancer gene list
- Integrate clinical data
- Launch, configure, customize, and scale virtual machines (VM)
- Navigate and work with data sets from Cloud repositories
- Follow best practices in data and workflow management
This workshop is intended for clinical researchers, researcher scientists, post-doctoral fellows, and graduate students with cancer genomics research projects.
UNIX and R familiarity is required.
You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact firstname.lastname@example.org for other possible options.
This workshop requires participants to complete pre-workshop tasks and readings.
Module 1: Introduction to Cancer Genomics (Instructor: Trevor Pugh)
- Overview of cancer genomics field
- Common applications of HT technologies in cancer genomics
- Concepts and case studies of cancer genomics from the literature: Cancer genetics, Pharmacogenomics, Diagnostic vs. prognostic markers and druggable targets
Module 2: Ethics of Data Usage and Security (Instructor: Mark Phillips)
- Introduction to Cloud computing and virtual machines (VMs)
- Ethical conduct when using genomic data
- Security of big data sets in the Cloud
- VM security and usage best practices: ssh keys, ports, snapshot of VM without privileged data, and shut down of VM when not in use.
Module 3: Databases and Visualization Tools (Francis Ouellette)
- Overview of cancer specific databases, as well genome browsing and cancer genome browsing.
- The databases: Collaboratory, ICGC portal, TCGA, COSMIC, dbSNP, etc.
- The browser tools: IGV, UCSC
- How to use the genome browsers to visualize transcripts, mutations, and other cancer genome features. Subsequent modules and lab practicals will use the same browser tools.
Module 4: Genome Alignment (Jared Simpson)
- What is involved in mapping reads to a reference genome
- What are the FASTQ and SAM/BAM file formats
- Some common terminology used to describe alignments
- Run a read alignment
Module 5: Genome Assembly (Jared Simpson)
- Fundamentals of de novo assembly
- Data structures used by assemblers (de Bruijn graphs and overlap graphs)
- Common steps that assemblers perform
- Overview of commonly used software
- Run a read assembly.
Module 6: Somatic Copy Number Changes (Sorana Morrissy)
- Importance of copy number alterations in cancer
- Methods for detecting copy number alterations
- Tools for evaluating CNAs in HT-seq data
- Hands-on lab exercises using CNA caller tool for CNA detection in SNP arrays.
Module 7: Somatic Mutations and Annotations (Sorana Morrissy)
- Relevance of detecting somatic mutations in cancer genomics
- Strategies for detection of somatic mutations and factors considered by SNP callers
- Binomial mixture models to model allelic counts
- Simultaneous analysis of tumor and normal data
- Sources of artifacts and false positives
- Hands on lab exercises using SNP calling tools for somatic mutation detection and visualization in IGV.
Module 8: Gene Expression Profiling (Florence Cavalli)
- The Technology Platform: high-throughput sequencing
- Variety of platforms and their differences
- Experimental design considerations
- Limitations of experiments”]}
- The Analysis Tools: Outline of a RNA-Seq analysis pipeline, Tools for analysis of RNA-Seq data
- Analysis of RNA-Seq data.
Module 9: Gene Fusion Discovery and Genomic Rearrangements (Brian Haas)
- Genomic rearrangement and its effect on the transcriptome
- Biological relevance of gene fusions in cancer biology
- The technology platform: RNA-Seq
- Overview of the RNA-Seq protocol
- Experimental design considerations
- The analysis tools: Alignment based fusion discovery, Paired end RNA-Seq alignments and gene fusion evidence, Discerning true fusions from artifacts, Assembly based fusion discovery
- Benefits/drawbacks of assembly methods
- Identifying gene fusions from RNA-Seq assemblies
- Clinical applications
- Hands on exercise using deFuse to identify fusions. Visualize RNA-Seq fusion reads in IGV, plot exon expression of fused genes, and use the UCSC genome browser to view fusions in tandem with other genomic features.
Module 10: Genes to Pathways (Instructor: Jüri Reimand)
- Introduction to gene lists
- Gene annotations: Gene identifiers and pathway databases
- What is pathway enrichment analysis?
- During this hands-on session, students will have the opportunity to play with gene identifiers, and to create a network from gene-set enrichment results using Cytoscape.
Module 11: Variants to Networks (Instructor: Robin Haw)
- Overview of pathway and network analysis
- Basic network concepts
- Types of pathway and network information
- Introduction to the Common Workflow Language
Lab Practical: Reactome (Robin Haw)
- Workflow of tools and steps
- Reactome FI
Module 12: Integration of Clinical Data (Instructor: Lauren Erdman)
- Introduction to correlating clinical outcomes with cancer genomic data
- How do variants discovered in genomic data result in clinical outcomes?
- Challenges with integration of heterogeneous data types (clinical vs. genomics)
- Survival analysis (univariate and multivariate)
- Analysis of clinical cancer data using R
Duration: 5 days
Start: Jun 03, 2019
End: Jun 07, 2019
Course Mode: Onsite
Status: Registration Closed
Open Access Content:
Canadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.