workshop image

Course Description

Before we can begin to apply rigorous statistical tools to research data, we often need to approach our data intuitively, and look for meaningful associations, surprising patterns, or irregularities, to formulate hypotheses. This is Exploratory Data Analysis (EDA). This workshop introduces the essential tools and strategies that are available for EDA through the free statistical workbench R. Steps covered in this workshop are broadly relevant for many areas of modern, quantitative biology such as flow cytometry, expression profile analysis, function prediction and more.

Participants will gain practical experience and skills to be able to:

  • Use R and its analysis tools, read and modify code, and explore protocols that can be adapted for their own research tasks.
  • Write R functions and analysis scripts.
  • Plot and visualize data using the elementary built-in routines via their (sometimes bewildering) array of parameters to sophisticated, publication-ready presentations.

Target Audience

Graduates, postgraduates, and PIs who design and execute strategies for data analysis who have some familiarity with the R statistical workbench.


You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact for other possible options.

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Exploratory data analysis for biological data (EDA)

  • Exploratory data analysis principles
  • Reading and writing data from common biological file-formats, including numeric data, sequences, annotations, and networks
  • Regular expressions
  • Descriptive statistics: mean/median and variance, quantiles, outliers
  • Plotting in R: basics, advanced options, special packages and best practices

Module 2: Regression

  • Types of models for regression analysis in R
  • Calculating linear regressions and plotting residuals
  • Non-linear regression with arbitrary functions
  • Maximum Information Coefficient

Module 3: Dimension reduction

  • Visualizing multi-dimensional data
  • Dimensionality reduction with Principal Components Analysis
  • Using explicit models for data reduction
  • t-Stochastic Neighbour Embedding

Lab Practical

  • Principal component analysis of high dimensional data
  • An integrated tutorial (until 8pm)

Module 4: Clustering

  • Calculating ‘distance’ between (high-dimensional) data points
  • Clustering principles & methods
  • Assessing the quality of clustering results

Lab Practical

  • Evaluation and comparison of different clustering techniques

Module 5: Hypothesis testing for EDA

  • Statistical models, hypotheses and how to test them
  • Quantifying quality: p-values, distributions, Z-scores and “significance”
  • Nonparametric approaches
  • Bootstrap and resampling techniques
  • Multiple testing corrections: Bonferroni, family wise error rate, false discovery rate
  • Simulation testing

Workshop Details:

Duration: 2 days

Start: May 15, 2019

End: May 16, 2019

Location: Toronto, Ontario, Canada

Course Mode: Onsite

Status: Registration Closed

Workshop Started


for applications received between to
Limited to: 30 participants

Lead Instructors:

Open Access Content:

Canadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.



Posted on: