Description Target Audience Prerequisites Outline

Course Description

Microbes are everywhere and the study of microbiomes to elucidate the impact of microbes in various ecosystems has become a multidisciplinary science involving microbiology, statistics, computer science, and molecular biology. Microbiomics, the study of microbes without first culturing and isolating the organisms, has become the principal approach to exploring the diversity, function and ecology of microbial communities. The CBW has developed a 2-day course providing an introduction to metagenomic data analysis (both read-based and assembly-based) followed by hands-on practical tutorials for each session to demonstrate the use of relevant bioinformatics and statistical tools. Modules will consist of lectures covering both theoretical and practical components followed by hands-on bioinformatic tutorials guided by instructors and teaching assistants.

Course Objectives

Participants will gain practical experience and skills to be able to:

Understand the advantages and limitations of metagenomic data analysis
Devise an appropriate bioinformatics workflow for processing and analyzing microbiome shotgun metagenomic sequence data
Perform both read-based and assembly-based analyses of shotgun metagenomic sequence data
Apply appropriate statistics to undertake rigorous data analysis

Target Audience

Graduates, postgraduates, staff bioinformaticians and PIs working with or about to embark on analysis of shotgun metagenomic sequence data from microbiome-focused experiments.

Prerequisites

Participants should be comfortable with reading and writing basic R or Bash, or be enrolled in the Beginner Microbiome Analysis course. Participants should have some prior experience with microbiome sequencing data (for example, have completed a marker gene analysis of microbiome data).

You will require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 2.4GHz CPU, 8GB RAM, 100GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements).

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Introduction to metagenomics and read-based profiling

Challenges and benefits of metagenomic data
Comparison of major approaches (read based vs assembly based)
Importance of sequencing depth and host contamination
Long read vs short read
Approaches for assigning taxonomy to shotgun metagenomic data

Lab practical

Quality control of short reads and removal of host contamination using Kneaddata in the Unix command line
Taxonomic profiling of short reads using Kraken and MetaPhlAn in the Unix command line
Analysis of taxonomic assignment results using Phyloseq in R

Module 2: Metagenomic Assembly and Binning

Overview of binning theory/approaches, advantages/disadvantages
Quality metrics for assembly and binning
Genome-resolved metagenomics using the Anv’io ecosystem

Lab practical

Assembly of short reads into contigs, binning of contigs and refinement of bins into Metagenome Assembled Genomes (MAGs) using Anv’io in the Unix command line
Metrics for assessing MAG quality using Anv’io in the Unix command line

Module 3: Metagenomic functional annotation

Approaches for assigning functions to shotgun metagenomic data
Microbial gene annotation and normalizations
Methods for assigning functions to reads and the differences in approaches for assigning functions to all reads versus specific functions such as Antimicrobial Resistance (AMR) or carbohydrate metabolism genes
Stratifying functions by taxonomy

Lab practical

General functional annotation using MMSeqs and HUMAnN in the Unix command line
AMR annotation of reads using CARD RGI in the Unix command line
Visualization of functions stratified by taxonomy using JarrVis in R

Module 4: Advanced microbiome statistics

How to incorporate confounding variables into statistical analyses of microbiome data (e.g. age and sex in human-based studies)
Using machine learning (e.g. Random Forests) for diagnostics or identification of important microbial features

Lab practical