Description Target Audience Prerequisites Outline

Course Description

Before we can begin to apply rigorous statistical tools to research data, we often need to approach our data intuitively, and look for meaningful associations, surprising patterns, or irregularities, to formulate hypotheses. This is Exploratory Data Analysis (EDA). This workshop introduces the essential tools and strategies that are available for EDA through the free statistical workbench R. Steps covered in this workshop are broadly relevant for many areas of modern, quantitative biology such as flow cytometry, expression profile analysis, function prediction and more.

Course Objectives

Participants will gain practical experience and skills to be able to:

Use R and its analysis tools, read and modify code, and explore protocols that can be adapted for their own research tasks.
Write R functions and analysis scripts.
Plot and visualize data using the elementary built-in routines via their (sometimes bewildering) array of parameters to sophisticated, publication-ready presentations.

Target Audience

Graduates, postgraduates, and PIs who design and execute strategies for data analysis who have some familiarity with the R statistical workbench.

Prerequisites

You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact support@bioinformatics.ca for other possible options.

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Exploratory data analysis for biological data (EDA)

Exploratory data analysis principles
Reading and writing data from common biological file-formats, including numeric data, sequences, annotations, and networks
Regular expressions
Descriptive statistics: mean/median and variance, quantiles, outliers
Plotting in R: basics, advanced options, special packages and best practices

Module 2: Regression

Types of models for regression analysis in R
Calculating linear regressions and plotting residuals
Non-linear regression with arbitrary functions
Maximum Information Coefficient

Module 3: Dimension reduction

Visualizing multi-dimensional data
Dimensionality reduction with Principal Components Analysis
Using explicit models for data reduction
t-Stochastic Neighbour Embedding

Lab Practical

Principal component analysis of high dimensional data
An integrated tutorial (until 8pm)

Module 4: Clustering

Calculating ‘distance’ between (high-dimensional) data points
Clustering principles & methods
Assessing the quality of clustering results

Lab Practical

Evaluation and comparison of different clustering techniques

Module 5: Hypothesis testing for EDA

Statistical models, hypotheses and how to test them
Quantifying quality: p-values, distributions, Z-scores and “significance”
Nonparametric approaches
Bootstrap and resampling techniques
Multiple testing corrections: Bonferroni, family wise error rate, false discovery rate
Simulation testing

Workshop Details:

Duration: 2 days

Start: Jun 11, 2020

End: Jun 12, 2020

Location:

Course Mode:
Mode Filter

Status: Registration Closed

Workshop Ended

Offers:

for applications received between to

Limited to: 30 participants

Lead Instructors: