Introduction to R

Workshop banner

Course Objectives

R is rapidly becoming the most important scripting language for both experimental and computational biologists. It is well designed, efficient, widely adopted and has a very large base of contributors who add new functionality for all modern aspects of data analysis and visualization. Moreover it is free and open source. However, R’s great power and expressivity can at first be difficult to approach without guidance, especially for those who are new to programming. This workshop introduces the essential ideas and tools of R. Although this workshop will cover running statistical tests in R, it does not cover statistical concepts.

Participants will gain practical experience and skills to be able to:

  • Meet the challenges of data handling
  • Break down problems into structured parts
  • Use R syntax, functions and packages
  • Understand best practices for scientific computational work

Target Audience

Graduates, postgraduates, and PIs who design and execute strategies for data analysis but have little or no familiarity with the R statistical workbench. This workshop is designed to lead on to the two-day workshop on Exploratory Data Analysis, which follows it.

Prerequisites: You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact course_info@bioinformatics.ca for other possible options.

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Getting to Know R

  • The environment and the user interface
  • How to get help and where to find information
  • Syntax and language principles
  • Data types: numbers, time and factors, strings and text
  • Data classes: vectors, matrices, lists, dataframes and hashes
  • Reading and writing data (including: from Excel and from the Web)
  • Only the best of my data: subsetting matrices, slicing, filtering and reshaping, plyr and dplyr

Integrated Assessment:

  • Using ggplot for (nicer) plots

Module 2: Programming Basics

  • Programming basics
  • Get it done: functions and their arguments
  • Get it really done: debugging
  • Slow and fast: loops vs. vectorized operations
  • Get even more done: finding and installing useful packages

Module 3: Using R for Data Analysis

  • The bioconductor project
  • Have something to show for it: basic plots and slightly more advanced plots
  • The graphics state
  • 10% is 90%: Axes, margins, multiple plots and leg