Before we can begin to apply rigorous statistical tools to research data, we often need to approach our data intuitively, and look for meaningful associations, surprising patterns, or irregularities, to formulate hypotheses. This is Analysis using R (AUR). This workshop introduces the essential tools and strategies that are available for AUR through the free statistical workbench R. Steps covered in this workshop are broadly relevant for many areas of modern, quantitative biology such as flow cytometry, expression profile analysis, function prediction and more.
Participants will gain practical experience and skills to be able to use R to visualize and investigate patterns in their data.
Graduates, postgraduates, and PIs who design and execute strategies for data analysis and who are using the R statistical workbench.
You are expected to be a regular user of R. If you do not regularly use R, please begin by taking the Introduction to R workshop.
You will require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact support@bioinformatics.ca for more information.
This workshop requires participants to complete pre-workshop tasks and readings.
Module 1: Exploratory data analysis Overview & Clustering
- Knowing your data: An overall workflow for exploratory data analysis
- Understand the difference between response variables, explanatory variables, biological variation, technical variation, and batch effects
- Missing data; understand how to identify structured versus unstructured missingness, and the role of imputation
- Finding unwanted sources of variation; surrogate variable analysis and RUVseq
- Knowing your data’s structure. Calculating “distance” between (high-dimensional) data points
- What distance metrics represent
- Different kinds of different metrics and when to use them
- Clustering principles & methods
- Why cluster?
- A survey of clustering methods
- Choose the clustering method that is right for your data
- Assessing the quality of clustering results
- Metrics for identifying the optimal number of clusters
- Existential questions introduced by clustering
Module 2: Dimensionality reduction
- What is dimensionality reduction, and common applications in bioinformatics
- Dimensionality reduction with Principal Components Analysis (PCA)
- Conduct PCA on different types of data
- Get information out of PCA objects in R
- Some practical uses of PCA
- Plot and learn from PCA output
- Use PCs as control variables in your analysis
- Use PCs as variables of interest in your analysis
- Other types of dimensionality reduction
- t-stochastic neighbor embedding (tSNE)
- uniform manifold approximation and project (UMAP)
Module 3: Fitting generalized linear models
- Read different data files into R
- Merge data and handle missing values
- Use ggplot to create and modify publication-quality R plots
- Plot and fit linear model for continuous-valued outcome, and logistic model for dichotomous outcome
Module 4: Differential expression analysis
- Manually conduct many parallel statistical tests
- Different types of statistical tests
- Evaluate and plot output
- Extract output for tables
- Visualize p-values from multiple testing: QQplot, volcano plot.
- Correct for multiple statistical tests: Bonferroni, false discovery rate
- Using bioconductor for analysis
- Perform differential expression analysis
Duration: 2 days
Start: Jun 28, 2023
End: Jun 29, 2023
Status: Registration Closed
Workshop Ended
Canadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.
Posted on: