workshop image

Course Description

Before we can begin to apply rigorous statistical tools to research data, we often need to approach our data intuitively, and look for meaningful associations, surprising patterns, or irregularities, to formulate hypotheses. This is Exploratory Data Analysis (EDA). This workshop introduces the essential tools and strategies that are available for EDA through the free statistical workbench R. Steps covered in this workshop are broadly relevant for many areas of modern, quantitative biology such as flow cytometry, expression profile analysis, function prediction and more. Participants will gain practical experience and skills to be able to use R to visualize and investigate patterns in their data.

Target Audience

Graduates, postgraduates, and PIs who design and execute strategies for data analysis who have some familiarity with the R statistical workbench.


You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact for more information.

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Introduction to exploratory data analysis for biological data (EDA)

  • EDA principles
  • Reading and writing data
  • Reading in common file formats
  • Accessing and using this data once it’s in R (numeric data, sequences, annotations, and networks)
  • Recoding variables (factors, regular expressions, missing values, etc)
  • Missing data handling and analysis
  • Writing R objects to different file formats
  • Descriptive statistics: mean/median and variance, quantiles, outliers
  • Plotting in R: basics, advanced options, special packages and best practices (base R, ggplot)

Module 2: Clustering

  • Programming basics
  • Get it done: functions and their arguments
  • Get it really done: debugging
  • Slow and fast: loops vs. vectorized operations
  • Get even more done: finding and installing useful packages

Module 3: Dimension reduction

  • Why visualize multi-dimensional data
  • Dimensionality reduction with Principal Components Analysis
  • Conduct PCA on different types of data
  • Getting information out of PCA objects in R
  • Some practical uses of PCA
  • Plotting and learning from PCA output
  • PCs as control variables in your analysis
  • PCs as variables of interest in your analysis
  • Other types of dimensionality reduction
  • t-stochastic neighbor embedding (tSNE)
  • uniform manifold approximation and project (UMAP)

Module 4 (short): Making a report of your analyses

  • R Markdown: creating a notebook of your EDA

Workshop Details:

Duration: 2 days

Start: Jun 28, 2021

End: Jun 29, 2021


Course Mode: Online

Status: Registration Closed

Workshop Started


CAD $259 for applications received between March 18, 2021 to May 28, 2021
CAD $345 for applications received between May 29, 2021 to June 23, 2021
Limited to: 40 participants

Lead Instructors:

Open Access Content:

Canadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.



Posted on: