Machine Learning

Workshop banner

Course Objectives

This workshop is intended to provide an introduction to machine learning and its application to bioinformatics. This workshop is not intended for machine learning experts. Instead it targets biologists or other life scientists who are wanting to understand what machine learning, what it can do and how it can be used for a variety of bioinformatic or medical informatics applications. Students will gain experience in:

  • Applications and Limitations of Machine Learning and Deep Learning
  • Data encoding for Machine Learning
  • Artificial Neural Networks (ANNs) - how they work and how they can be used in bioinformatic applications (secondary structure prediction)
  • ANNs – how to program a useful ANN for bioinformatics in Python
  • Hidden Markov Models (HMMs) - how they work and how they can be used in bioinformatics applications (gene finding)
  • HMMs – how to program a useful HMM for bioinformatics in Python
  • Support Vector Machines, Decision Trees an Random Forests – how they work and how they can be used in bioinformatic applications (biomarker discovery and modeling)
  • Using Machine Learning tools on the Web (WEKA)
  • Using Machine Learning Apps (TENSORFLOW)

Target Audience

Graduates, postgraduates, staff bioinformaticians and PIs working with or about to embark on using machine learning for bioinformatics applications

Prerequisites: You will also require your own laptop computer. Minimum requirements: 1024x768 screen resolution, 1.5GHz CPU, 2GB RAM, 10GB free disk space, recent versions of Windows, Mac OS X or Linux (Most computers purchased in the past 3-4 years likely meet these requirements). If you do not have access to your own computer, please contact course_info@bioinformatics.ca for other possible options.

Familiar with Linux or Unix operating systems, familiar with Python

This workshop requires participants to complete pre-workshop tasks and readings.

Course Outline

Module 1: Introduction to machine learning

  • Examples of machine learning in bioinformatics
  • Brief introduction to machine learning methods including:
  • artificial neural networks
  • hidden markov models
  • decision trees and random forests
  • deep neural networks
  • support vector machine

Module 2: Artificial Neural Networks

  • The meaning of hidden layers. An ANN simulation
  • Coding (Python) a simple ANN to do secondary structure prediction

Module 3: Hidden Markov Models

  • An HMM Simulation
  • Coding (Python) a simple HMM to do gene prediction
  • Decision Trees, Random Forests and Support Vector Machines
  • Coding (Python) to perform biomarker optimization with an SVM

Module 4: Machine Learning on the Web

  • Structured example of machine learning with bioinformatics data using different approaches
  • Encoding data for machine learning applications
  • Assessing model performance

Module 5: Machine Learning Apps (TensorFlow)

  • Structured example of machine learning with bioinformatics data using Tensor Flow
  • Examples of deep learning with TensorFlow