Before we can begin to apply rigorous statistical tools to research data, we often need to approach our data intuitively, and look for meaningful associations, surprising patterns, or irregularities, to formulate hypotheses. This is Exploratory Data Analysis (EDA). This workshop introduces the essential tools and strategies that are available for EDA through the free statistical workbench R. Steps covered in this workshop are broadly relevant for many areas of modern, quantitative biology such as flow cytometry, expression profile analysis, function prediction and more. Participants will gain practical experience and skills to be able to use R to visualize and investigate patterns in their data.
Graduates, postgraduates, and PIs who design and execute strategies for data analysis who have some familiarity with the R statistical workbench.
You will also require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact email@example.com for more information.
This workshop requires participants to complete pre-workshop tasks and readings.
Module 1: Introduction to exploratory data analysis for biological data (EDA)
- EDA principles
- Reading and writing data
- Reading in common file formats
- Accessing and using this data once it’s in R (numeric data, sequences, annotations, and networks)
- Recoding variables (factors, regular expressions, missing values, etc)
- Missing data handling and analysis
- Writing R objects to different file formats
- Descriptive statistics: mean/median and variance, quantiles, outliers
- Plotting in R: basics, advanced options, special packages and best practices (base R, ggplot)
Module 2: Clustering
- Programming basics
- Get it done: functions and their arguments
- Get it really done: debugging
- Slow and fast: loops vs. vectorized operations
- Get even more done: finding and installing useful packages
Module 3: Dimension reduction
- Why visualize multi-dimensional data
- Dimensionality reduction with Principal Components Analysis
- Conduct PCA on different types of data
- Getting information out of PCA objects in R
- Some practical uses of PCA
- Plotting and learning from PCA output
- PCs as control variables in your analysis
- PCs as variables of interest in your analysis
- Other types of dimensionality reduction
- t-stochastic neighbor embedding (tSNE)
- uniform manifold approximation and project (UMAP)
Module 4 (short): Making a report of your analyses
- R Markdown: creating a notebook of your EDA
Duration: 2 days
Start: Jun 28, 2021
End: Jun 29, 2021
Course Mode: Online
Status: Registration Closed
Open Access Content:
Canadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.