
Course Description
Participants will gain practical experience and skills to be able to use R to visualize and investigate patterns in their data.
Target Audience
Graduates, postgraduates, and PIs who design and execute strategies for data analysis and who are using the R statistical workbench.
Prerequisites
You are expected to be a regular user of R. If you do not regularly use R, please begin by taking the Introduction to R workshop.
You will require your own laptop computer. Minimum requirements: 1024×768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, you may loan one from the CBW. Please contact support@bioinformatics.ca for more information.
This workshop requires participants to complete pre-workshop tasks and readings.
Course Outline
Module 1: Exploratory data analysis Overview & Clustering
- Knowing your data: An overall workflow for exploratory data analysis
- Understand the difference between response variables, explanatory variables, biological variation, technical variation, and batch effects
- Missing data; understand how to identify structured versus unstructured missingness, and the role of imputation
- Finding unwanted sources of variation; surrogate variable analysis and RUVseq
- Knowing your data’s structure. Calculating “distance” between (high-dimensional) data points
- What distance metrics represent
- Different kinds of different metrics and when to use them
- Clustering principles & methods
- Why cluster?
- A survey of clustering methods
- Choose the clustering method that is right for your data
- Assessing the quality of clustering results
- Metrics for identifying the optimal number of clusters
- Existential questions introduced by clustering
Module 2: Dimensionality reduction
- What is dimensionality reduction, and common applications in bioinformatics
- Dimensionality reduction with Principal Components Analysis (PCA)
- Conduct PCA on different types of data
- Get information out of PCA objects in R
- Some practical uses of PCA
- Plot and learn from PCA output
- Use PCs as control variables in your analysis
- Use PCs as variables of interest in your analysis
- Other types of dimensionality reduction
- t-stochastic neighbor embedding (tSNE)
- uniform manifold approximation and project (UMAP)
Module 3: Fitting generalized linear models
- Read different data files into R
- Merge data and handle missing values
- Use ggplot to create and modify publication-quality R plots
- Plot and fit linear model for continuous-valued outcome, and logistic model for dichotomous outcome
Module 4: Differential expression analysis
- Manually conduct many parallel statistical tests
- Different types of statistical tests
- Evaluate and plot output
- Extract output for tables
- Visualize p-values from multiple testing: QQplot, volcano plot.
- Correct for multiple statistical tests: Bonferroni, false discovery rate
- Using bioconductor for analysis
- Perform differential expression analysis
Workshop Details:
Duration: 2 days
Start: Jun 28, 2023
End: Jun 29, 2023
Course Mode: Onsite
Status: Registration Closed
Workshop Started
Offers:
Lead Instructors:
Open Access Content:
Canadian Bioinformatics Workshops promotes open access. Past workshop content is available under a Creative Commons License.
Posted on: