Exploratory Data Analysis and Essential Statistics using R
Vancouver Date: May 6-7, 2010 in Vancouver, BC
Lead Faculty (2010): Raphael Gottardo & Sohrab Shah
Registration Fee for Applications received before April 12, 2010: CLOSED
Registration Fee for Applications received after April 12, 2010: CLOSED
Toronto Date: September 30 - October 1, 2010 in Downtown Toronto, ON
Lead Faculty (2010): Raphael Gottardo & Boris Steipe
Registration Fee for Applications received before September 3, 2010: $500 + HST
Registration Fee for Applications received after September 3, 2010: $700 + HST
Apply now!
Target Audience
Graduates, postgraduates and PIs who need to design and execute strategies for data analysis but have little or no formal prior training in statistics.
Prerequisite: Your own laptop with R installed. If you do not have access to a laptop, you may loan one from CBW for a fee. Please contact course_info@bioinformatics.ca for more information. Completing an online tutorial on installation and basic use of "R" before the workshop.
Course Objectives
Extracting true, meaningful information from a data set with enough confidence to guide future research projects involves a solid understanding of what statistic inferences are based on. The CBW
Faculty has developed a 2-day intensive course covering essential topics in common statistic approaches. We will focus on the procedures, their underlying assumptions, and what to do when these assumptions don't apply. Participants should be able to implement analyses for common cases, identify potential problems in their own research, and define their statistics needs for cases in which expert advice is required. Case studies with common research scenarios such as microarray data, and flow cytometry will emphasize practical skills.
Course Outline
Each module contains approximately 1.5 hours lecture, 30 minute break and a lab.
A comprehensive lecture and laboratory manual will be provided.
Day 1
Module 1: Introduction to R (Faculty: Raphael Gottardo)
- Ice breaking session for participants (promote networking)
- Simple manipulations; numbers and vectors
- Basic functions (sort, summary, plot, etc)
- Reading and writing data from/to files
- Conditional statements and loops
- Writing your own function
Break (30 minutes)
Module 2: Exploratory data analysis and common statistical assumptions in biology (Faculty: Raphael Gottardo)
- Review of mean/median, variance, boxplots, histograms, scatter plot, scatter plot smoothers
- Lab practical: Exploratory data analysis of flow cytometry and microarray data
Lunch
Module 3: Hypothesis testing (Faculty: Gottardo)
- One sample and two sample t-tests, F-tests, p-values
- Type I error rate, Type II error rate
- Common assumptions and what to do if assumptions are not met
- Bootstrap and resampling techniques
- Multiple testing (Family wise error rate, false discovery rate)
- Power calculation and sample size
- Lab Practical: Finding differentially expressed genes with microarray data
Dinner
- Lab Practical: Working with your own data
Day 2
Module 4: Data reduction (Faculty: Gottardo)
- Data reduction with PCA
- Lab Practical: PCA of microarray data and flow cytometry data
Break (30 minutes)
Module 5: Clustering and classification (Faculty: Shah/Steipe)
- Clustering
- Classification
- Lab Practical: Functional annotation from clustering in expression data
Lunch
Module 6: Regression and correlation (Faculty: Sohrab Shah/Boris Steipe)
- Simple linear regression
- Least squares estimation; residuals and fitted values
- Logistic regression
- Prediction
- Common assumptions, model checking and what to do if common assumptions are not satisfied
- Model fitting with R
- Lab Practical: TBD
