Description Target Audience Prerequisites Outline

Course Description

Before we can begin to apply rigorous statistical tools to research data, we often need to approach our data intuitively, and look for meaningful associations, surprising patterns, or irregularities, to formulate hypotheses. This is Exploratory Data Analysis (EDA). This workshop introduces the essential tools and strategies that are available for EDA through the free statistical workbench R. Steps covered in this workshop are broadly relevant for many areas of modern, quantitative biology such as flow cytometry, expression profile analysis, function prediction and more.

Course Objectives

Participants will gain practical experience and skills to be able to use R to visualize and investigate patterns in their data.

Target Audience

Graduates, postgraduates, and PIs who design and execute strategies for data analysis who have some familiarity with the R statistical workbench.

Prerequisites

You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact support@bioinformatics.ca for more information.

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Introduction to exploratory data analysis for biological data (EDA)

EDA principles
Reading and writing data
Reading in common file formats
Accessing and using this data once it’s in R (numeric data, sequences, annotations, and networks)
Recoding variables (factors, regular expressions, missing values, etc)
Missing data handling and analysis
Writing R objects to different file formats
Descriptive statistics: mean/median and variance, quantiles, outliers
Plotting in R: basics, advanced options, special packages and best practices (base R, ggplot)

Module 2: Clustering