Microbiome Summer School: Big Data Analytics for Omics Science CIHR/CBW/CERC/BDRC (2017) New

Course Objectives

Bioinformatics has become an important part of many areas of health research. Several fields of research such as microbiome, cancer, immunology and clinical research are generating massive amounts of data that need appropriate computational tools for their analysis and interpretation. Moreover, a need for bioinformatics training is more pressing than ever as big data are more and more present in health research.

The Canadian Institutes of Health Research (CIHR), Université Laval Big Data Research Centre, the Canada Excellence Research Chairs (CERC) Program, and CBW are offering a 4-day workshop centered on the microbiome. This workshop will provide an introduction to the use of bioinformatics in microbiome research as well as big data analytics applied to this field.

Participants will gain practical experience and skills to be able to:

  • Design appropriate microbiome-focused experiments
  • Understand the advantages and limitations of metagenomic data analysis
  • Devise an appropriate bioinformatics workflow for processing and analyzing metagenomic sequence data (marker-gene, shotgun metagenomic, and metatranscriptomic data)
  • Apply appropriate statistics to undertake rigorous data analysis
  • Visualize datasets to gain intuitive insights into the composition and/or activity of their data set
  • Use machine learning approaches to solve problems

Target Audience

Graduates, postgraduates, staff bioinformaticians, lab managers, and early to mid-career investigators intending to or working in microbiome research with a biomedical/health perspective who want to gain experience in bioinformatics.

Pre-Requisites: Basic familiarity with Linux environment and statistical analysis is required. Must be able to complete and understand the Introduction to Linux and Cytoscape tutorials. Familiarity can be gained through online activities. You should be familiar with these UNIX concepts (tutorial 1-3).

Course Outline

Day 1

Welcome and Registration (2017)

Day 2

Module 0: How to Use Compute Canada Resources

  • Connecting to HPC infrastructure

Module 1: Microbiome and Big Data Analytics (2017) (Instructor: Jacques Corbeil)

  • Review of relevant terms (microbial communities, microbiome, species, metagenome, marker genes, metatranscriptomics)
  • Technologies used in meta’omics

Module 2: Microbiomics/Metagenomics (2017) (Instructor: Robert Beiko)

  • Experimental design and sample preparation considerations
  • Meta’omic surveys: primary objectives, types, and workflows
  • 16S rRNA genes vs. shotgun sequencing
  • Starting points in metagenome data analysis: sequence files, resources, reference databases

Lab Practical: Read processing/QIIME2

  • Explore and understand raw read files
  • Quality control (QC) filtering
  • Single end vs. paired-end sequence data
  • Sequence data cleanup (trimming, binning) using QIIME2
  • Sequence de-replication
  • Understanding common file formats (BIOM)

Module 2b: Metagenomics (2017) (Instructor: Morgan LangilleInstructor: Frédéric Raymond)

  • Determining the taxonomic composition of a microbiome sample
  • Why this is a hard problem (e.g. lateral gene transfer, short reads)
  • Comparison of taxonomic assignment methods
    • Binning-based methods (assigning taxonomy to most reads)
    • Marker-based methods (using only some of the shotgun sequence data)
  • Determining the functional composition of a microbiome sample
  • Functional composition vs. taxonomic composition?
  • Pros and cons of assembly
  • Overview of functional databases: KEGG (KOs, Modules, Pathways), COG, SEED, GO, PFAM, your own custom database
  • Comparison of tools such as MEGAN, Ray Meta, MetaAmos, and HUMAnN2

Lab Practical: MetaPhlan/HuMANN/STAMP/Ray Meta

  • Assign taxonomy with MetaPhlAn2
  • Annotate reads using HUMAnN2
  • Visualization and statistical comparisons with STAMP

Day 3

Module 3: Biomarkers (2017) (Instructor: Fiona Brinkman)

  • Benefits and applications of biomarkers
  • Types of markers - taxonomic, functional
  • Examples of existing biomarkers
  • Methods for identifying new markers
    • Normalization, copy number variation, and other considerations
    • Finding differential features: categorical, correlative
    • Ranking features
    • Network-based analysis
  • Towards a genetic test: Designing PCR/qPCR primers/tests
  • Example of biomarker ID success
  • General considerations, cautionary notes

Module 4: PICRUSt (2017) (Instructor: Morgan Langille)

  • Predicting the abundance of a single function
  • Predicting metagenomes

Lab Practical: PICRUSt

  • Functional predictions from 16S data

Module 5: Metatranscriptomics (2017) (Instructor: John Parkinson)

  • Gene expression in a microbiome vs. functional composition
  • RNA-seq applied to microbiomes:
    • Experimental design: additional considerations
    • Sample collection, storage and preparation
    • Processing metatranscriptomics reads: filters and assembly
  • Functional and taxonomic inference from metatranscriptome reads
    • Tools for functional inference: pathways, processes and networks
  • GIST for taxonomic inference
  • Statistical methods

Lab Practical: Metatranscriptomic analyses/Cytoscape/Ray surveyor

  • Reads to function and RPKM statistics
  • Reads to taxonomy using GIST
  • Statistics using ALDEx2
  • Functional and taxonomic visualization with Cytoscape and Ray surveyor

Keynote address 1: Vicenzo di Marzo

Day 4

Module 6: Host Genomics Applied to the Microbiome (2017) (Instructor: Marie-Pierre Dubé)

  • How host genetics modulates the microbiome

Integrated Assignment: Review of Lab 1-4
Use a 16S taxonomic profile to select a subset of samples for metagenomic analysis. Mimic study design where a marker gene-based survey is used to select a subset of samples for shotgun sequencing and analysis.
Use a new metagenomic dataset to determine the taxonomic composition and functional composition of all samples. Write a summary of taxa and functions that are statistically significant between the healthy control group and the disease group.

Module 7: Machine Learning (2017) (Instructor: François Laviolette)

  • General principles of Machine Learning
  • How to apply to genomic data
  • Interpretable machine Learning

Lab Practical: Machine Learning Analysis

  • Utilisation of basic Machine Learning algorithms on omic data on supercomputers
  • Hyper-parameters tuning methods for learning algorithms

Keynote address 2: William Noble

Day 5

Module 7b: Combining Biological Data (2017) (Anna Goldenberg)

  • Combining omic data from multiple heterogeneous sources while addressing missing data and noise

Module 8: Algorithms for Mass Spectrometry Applied to Metabolomics(2017) (Instructor: Mario Marchand)

  • Algorithms to align mass spectra and the detection of virtual lock masses
  • Machine learning algorithms to produce sparse predictors from mass spectra
    • Boosting algorithms
    • Decision trees
    • The set covering machine

Module 9: Algorithms for Drug Discovery (2017) (Instructor: Chloé-Agathe Azencott)

  • Efficient multi-locus biomarker discovery.
  • How to make sense of data with a small number of samples and a large number of variables.

Lab Practical: Machine Learning Analytics

  • Interpretable and parsimonious algorithms applied to omic data (Lasso, Group Lasso, SCM-Kover)
  • Similarity network fusion for aggregating data types on a genomic scale
  • Using algorithms to align mass spectra
  • Using machine learning algorithms (Boosting, set covering machines, decision trees) to classify mass spectra.

Open Access LogoCanadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.