Epigenomic Data Analysis

Course Objectives
High-throughput sequencing of Chromatin-Immunoprecipitated libraries (ChIP-seq) and of bisulfite converted DNA (WGBS) have become increasingly common and have largely supplanted microarrays for chromatin and DNA methylation profiling. When processed appropriately, ChIP-seq data provides base-pair resolution representations of transcription factor DNA-binding events and nucleosome (histone) modifications genome-wide. Similarly, WGBS can provide a quantitative genome wide profile of cytosine methylation.
The CBW has developed a 2-day course providing an introduction to histone ChIP-seq and WGBS data analysis followed by integrated tutorials demonstrating the use of open source ChIP-Seq and WGBS analysis packages. The tutorials are designed as self-contained units that include example data and detailed instructions for installation of all required bioinformatics tools (FASTQC, BWA, MACS2, FindER, samtools, Picard, BisSNP). The course also includes an overview of integrative epigenomic tools that have been developed to explore ChIP-Seq and WGBS data together with other epigenomic datasets such as RNA-seq, DHS-seq and ATAC-seq.
Participants will gain practical experience and skills to be able to:
- Align ChIP-seq and WGBS sequence data to a reference genome (required)
- Identify narrow and broad peaks from ChIP-seq data
- Identify methylated levels from WGBS data
- Visualize and summarize the output of ChIP-Seq and WGBS analyses
- Explore integrative tools for epigenomic data sets
Target Audience
Graduates, postgraduates and PIs working with or about to embark on an analysis of epigenomic data and in particular of ChIP-Seq and Whole-Genome Bisulfite Sequencing (WGBS) experiments. Attendees may be familiar with some aspect of ChIP-Seq or WGBS data analysis or have no direct experience. A reference genome is required and we will be working on human data sets.
Prerequisites: Basic familiarity with Linux environment and S, R, or Matlab. Must be able to complete and understand the following simple Linux and R tutorials (up to and including “Descriptive Statistics”) before attending:
- UNIX Tutorial (up to and including Tutorial Four) [http://www.ee.surrey.ac.uk/Teaching/Unix/]
- Quick & Dirty Guide to R [http://ww2.coastal.edu/kingw/statistics/R-tutorials/text/quick&dirty_R.txt]
You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements).If you do not have access to your own computer, you may loan one from the CBW. Please contact course_info@bioinformatics.ca for more information.
Pre-work and pre-readings can be found at https://bioinformaticsdotca.github.io/epigenomics_2018.
Course Material
-
Module 1: Introduction to ChIP Sequencing and Analysis (Martin Hirst)
Instructor(s): Martin Hirst
Content:
- Basic introduction to biology of ChIP-seq
- Experimental design and analysis considerations
- Commonly asked questions
Lab Practical
Instructors(s): Martin Hirst
Content:
- Configure, launch, and connect to a Compute Canada cloud instance
- Introduction to the test data
- Run FastQC
- Examine and understand the format of raw FastQ files
- Obtain reference genomes (fasta)
- Perform pre-alignment QC
Presentation file(s):
PDF -
Module 2: ChIP-Seq Alignment, Peak Calling, and Visualization (Misha Bilenky)
Instructor(s): Misha Bilenky
Content:
- Use of BWA and reference genome formating
- Introduction to the BAM format
- Basic manipulation of BAMs with samtools, Picard, etc.
- Use of MACS2/FindER to call narrow and broad peaks
- Visualization of ChIP-seq alignments and peaks with IGV
Lab Practical
Instructors(s): Misha Bilenky
Content:
- Run BWA with suitable parameters
- Use samtools to explore the features of the SAM/BAM format and perform basic manipulation of these alignment files (view, sort, index, manipulate headers, extract data, etc.)
- Use MACS2/FindER to call narrow and broad peaks.
- Visualize results in IGV.
-
Module 3: Introduction to WGBS and Analysis (Guillaume Bourque)
Instructor(s): Guillaume Bourque
Content:
- Basic introduction to biology of WGBS
- Experimental design and analysis considerations
- Commonly asked questions
- Perform in silico conversion of the reference genome
- Convert reads and map onto the genome
- Pileup mapped reads and call methylation profile
Presentation file(s):
PDFLab Practical
Instructors(s): Guillaume Bourque
Content:
- Prepare in silico converted reference genome and reads
- Map reads using BWA or Bismark
- Post-process reads using samtools and nxtgen-utils
- Call methylation profiles
- Explore and visualize results in IGV
Presentation file(s):
PDF -
Module 4: Downstream Analysis and Integrative Tools (David Bujold)
Instructor(s): David Bujold
Content:
- Overview of downstream functional analysis tools
- Identifying differentially bound or differentially methylated sites
- Explore motifs in ChIP-seq peaks using HOMER
- Looking for significant GO enrichment using GREAT
- Explore available datasets from ENCODE, NIH Roadmap and IHEC using the IHEC Data
- Study disease variant enrichments using GWAS resources
Presentation file(s):
PDFLab Practical
Instructors(s): David Bujold
Content:
- look for motifs (using HOMER)
- GO enrichment (using GREAT).
- Explore available reference epigenomic data sets using the IHEC Data Portal.
Title: Using peaks and methylated regions identified in Module 3 and 4:
Presentation file(s):
PDF