Epigenomic Data Analysis

Workshop banner

Course Objectives

High-throughput sequencing of Chromatin-Immunoprecipitated libraries (ChIP-seq) and of bisulfite converted DNA (WGBS) have become increasingly common and have largely supplanted microarrays for chromatin and DNA methylation profiling. When processed appropriately, ChIP-seq data provides base-pair resolution representations of transcription factor DNA-binding events and nucleosome (histone) modifications genome-wide. Similarly, WGBS can provide a quantitative genome wide profile of cytosine methylation.

The CBW has developed a 2-day course providing an introduction to histone ChIP-seq and WGBS data analysis followed by integrated tutorials demonstrating the use of open source ChIP-Seq and WGBS analysis packages. The tutorials are designed as self-contained units that include example data and detailed instructions for installation of all required bioinformatics tools (FASTQC, BWA, MACS2, FindER, samtools, Picard, BisSNP). The course also includes an overview of integrative epigenomic tools that have been developed to explore ChIP-Seq and WGBS data together with other epigenomic datasets such as RNA-seq, DHS-seq and ATAC-seq.

Participants will gain practical experience and skills to be able to:

  • Align ChIP-seq and WGBS sequence data to a reference genome (required)
  • Identify narrow and broad peaks from ChIP-seq data
  • Identify methylated levels from WGBS data
  • Visualize and summarize the output of ChIP-Seq and WGBS analyses
  • Explore integrative tools for epigenomic data sets

Target Audience

Graduates, postgraduates and PIs working with or about to embark on an analysis of epigenomic data and in particular of ChIP-Seq and Whole-Genome Bisulfite Sequencing (WGBS) experiments. Attendees may be familiar with some aspect of ChIP-Seq or WGBS data analysis or have no direct experience. A reference genome is required and we will be working on human data sets.

Prerequisites: Basic familiarity with Linux environment and S, R, or Matlab. Must be able to complete and understand the following simple Linux and R tutorials (up to and including “Descriptive Statistics”) before attending:

You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements).If you do not have access to your own computer, you may loan one from the CBW. Please contact course_info@bioinformatics.ca for more information.

Pre-work and pre-readings can be found at https://bioinformaticsdotca.github.io/epigenomics_2018.

Course Outline

Day 1

Module 1: Introduction to ChIP Sequencing and Analysis (Martin Hirst)

  • Basic introduction to biology of ChIP-seq
  • Experimental design and analysis considerations
  • Commonly asked questions

Lab Practical:

  • Configure, launch, and connect to a Compute Canada cloud instance
  • Introduction to the test data
  • Run FastQC
  • Examine and understand the format of raw FastQ files
  • Obtain reference genomes (fasta)
  • Perform pre-alignment QC

Module 2: ChIP-Seq Alignment, Peak Calling, and Visualization (Misha Bilenky)

  • Use of BWA and reference genome formating
  • Introduction to the BAM format
  • Basic manipulation of BAMs with samtools, Picard, etc.
  • Use of MACS2/FindER to call narrow and broad peaks
  • Visualization of ChIP-seq alignments and peaks with IGV

Lab Practical:

  • Run BWA with suitable parameters
  • Use samtools to explore the features of the SAM/BAM format and perform basic manipulation of these alignment files (view, sort, index, manipulate headers, extract data, etc.)
  • Use MACS2/FindER to call narrow and broad peaks.
  • Visualize results in IGV.

Day 2

Module 3: Introduction to WGBS and Analysis (Guillaume Bourque)

  • Basic introduction to biology of WGBS
  • Experimental design and analysis considerations
  • Commonly asked questions
  • Perform in silico conversion of the reference genome
  • Convert reads and map onto the genome
  • Pileup mapped reads and call methylation profile

Lab Practical:

  • Prepare in silico converted reference genome and reads
  • Map reads using BWA or Bismark
  • Post-process reads using samtools and nxtgen-utils
  • Call methylation profiles
  • Explore and visualize results in IGV

Module 4: Downstream Analysis and Integrative Tools (David Bujold)

  • Overview of downstream functional analysis tools
  • Identifying differentially bound or differentially methylated sites
  • Explore motifs in ChIP-seq peaks using HOMER
  • Looking for significant GO enrichment using GREAT
  • Explore available datasets from ENCODE, NIH Roadmap and IHEC using the IHEC Data
  • Study disease variant enrichments using GWAS resources

Lab Practical:

  • Using peaks and methylated regions identified in Module 3 and 4:
    • look for motifs (using HOMER)
    • GO enrichment (using GREAT).
  • Explore available reference epigenomic data sets using the IHEC Data Portal.