Analysis of Metagenomic Data

Workshop banner

Course Objectives

Metagenomics, the sequencing of DNA directly from a sample without first culturing and isolating the organisms, has become the principal tool of “meta-omic” analysis. It can be used to explore the diversity, function, and ecology of microbial communities. The CBW has developed a 3-day course providing an introduction to metagenomic data analysis followed by hands-on practical tutorials demonstrating the use of metagenome analysis tools. The tutorials are designed as self-contained units that include example data and detailed instructions for installation of all required bioinformatics tools.

Participants will gain practical experience and skills to be able to:

  • Design appropriate microbiome-focused experiments
  • Understand the advantages and limitations of metagenomic data analysis
  • Devise an appropriate bioinformatics workflow for processing and analyzing metagenomic sequence data (marker-gene, shotgun metagenomic, and metatranscriptomic data)
  • Apply appropriate statistics to undertake rigorous data analysis
  • Visualize datasets to gain intuitive insights into the composition and/or activity of their data set

Target Audience

Graduates, postgraduates, staff bioinformaticians and PIs working with or about to embark on analysis of marker genes, metagenomic, and metatranscriptomic data from microbiome-focused experiments.

Prerequisites: Basic familiarity with Linux environment and statistical analysis is required. Must be able to complete and understand the following simple Linux tutorial before attending:

You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Send us an email for more information.

Pre-work and pre-readings can be found on the student workshop pages.

Course Outline

Day 1

Module 1: Introduction to Metagenomics (Will Hsiao)

  • Review of relevant terms (microbial communities, microbiome, species, metagenome, marker genes, metatranscriptomics)
  • Technologies used in meta’omics
  • Experimental design and sample preparation considerations
  • Meta’omic surveys: primary objectives, types, and workflows
  • 16S rRNA genes vs. shotgun sequencing
  • Starting points in metagenome data analysis: sequence files, resources, reference databases


  • Sample collection and storage

Module 2: Marker Gene-based Analysis of Taxonomic Composition (Will Hsiao)

  • Advantages of marker-gene based analysis
  • Reference databases
  • Sequence quality, de-replication, and error correction
  • Alpha- and Beta- diversity measures
  • Comparison of samples based on taxonomic compositions

Lab Practical:

  • Setting up Amazon Web Services
  • Sequence data cleanup (trimming, binning) using QIIME
  • Sequence de-replication
  • Understanding common file formats (BIOM)
  • Alpha- and Beta-diversity measures

Module 3: Introduction to PICRUSt (Morgan Langille)

  • Approaches for metagenomic inference
  • An overview of the PICRUSt approach
  • Limitations to metagenomic inference
  • PICRUSt 2.0: Functional predictions from 16S data

Lab Practical:

  • Functional predictions from 16S data

Integrated Assignment Part 1:

  • Use a 16S taxonomic profile to select a subset of samples for metagenomic analysis. Mimic study design where a marker gene-based survey is used to select a subset of samples for shotgun sequencing and analysis.

Day 2

Module 4: Metagenomic Taxonomic and Functional Composition (Morgan Langille)

  • Contrast taxonomic and functional annotation
  • Discuss the difficulties of determining the taxonomic and functional composition of a metagenomic sample
  • Comparison of taxonomic assignment methods
  • Binning-based methods (assigning taxonomy to most reads)
    • Marker-based methods (using only some of the shotgun sequence data)
  • Overview of functional databases: KEGG (KOs, Modules, Pathways), MetaCyc, COG, SEED, GO, PFAM, your own custom database
  • An overview of several existing methods
  • An in-depth description of Metaphlan2 and HUMAnN2

Lab Practical:

  • Assign taxonomy with MetaPhlAn2
  • Functionally annotate reads using HUMAnN2
  • Annotate reads using HUMAnN2
  • Visualize taxonomic and functional differences across samples

Module 5: Metagenome Assembly, Binning, and Extracting Genomes from Metagenomes (Laura Hug)

  • Assembling metagenomes
  • Binning
  • Pulling genomes from metagenomes

Lab Practical:

  • Hands-on experience assembling metagenomes, binning, and extracting genomes

Integrated Assignment Part 2:

  • Use a new metagenomic dataset to determine the taxonomic composition and functional composition of all samples. Write a summary of taxa and functions that are statistically significant between the healthy control group and the disease group.

Day 3

Module 6: Metatranscriptomics (John Parkinson)

  • Gene expression in a microbiome vs. functional composition
  • RNA-seq applied to microbiomes:
  • Experimental design: additional considerations
    • Sample collection, storage and preparation
    • Processing metatranscriptomics reads: filters and assembly
  • Functional and taxonomic inference from metatranscriptome reads
    • Tools for functional inference: pathways, processes and networks
  • GIST for taxonomic inference

Lab Practical:

  • Reads to function and RPKM statistics
  • Reads to taxonomy using GIST
  • Statistics using ALDEx2
  • Functional and taxonomic visualization with Cytoscape

Module 7: Statistical Tests for Metagenomics (Robert Beiko)

  • Appropriate statistical tests for metagenomics

Lab Practical:

  • Visualization and statistical comparison with STAMP

Module 8: Biomarker selection (Fiona Brinkman)

  • Benefits and applications of biomarkers
  • Types of markers - taxonomic, functional
  • Examples of existing biomarkers
  • Methods for identifying new markers
    • Normalization, copy number variation, and other considerations
    • Finding differential features: categorical, correlative
    • Ranking features
    • Network-based analysis
  • Towards a genetic test: Designing PCR/qPCR primers/tests
  • Example of biomarker ID success
  • General considerations, cautionary notes