Analyzing Next-Generation Sequencing Data
May 31 - June 11th, 2010
Kellogg Biological Station, MSU
Applications for 2010 are now closed. See the course page for the 2011 course.
Course sponsor: Gene Expression in Disease and Development Focus Group at Michigan State University.
Instructors: Dr. C. Titus Brown and Dr. Gregory V. Wilson
This intensive two week summer course will introduce students with a strong biology background to the practice of analyzing short-read sequencing data from the Illumina GA2 and other next-gen platforms. The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on mapping, assembly, and analysis of short-read data for resequencing, ChIP-seq, and RNAseq.
No prior programming experience is required, although familiarity with some programming concepts is suggested, and bravery in the face of the unknown is necessary. 2 years or more of graduate school in a biological science is strongly suggested.
Students will gain practical experience in:
- Python and bash shell scripting
- cloud computing/Amazon EC2
- version control with git/mercurial.
- basic software installation on UNIX
- installing and running maq, bowtie, and velvet
- querying mappings and evaluating assemblies
In addition to the on-site training, we are developing open online resources to complement and extend the course so that students can revisit and "upgrade" their learning as new tools and techniques emerge. These materials will be made available through the Software Carpentry Web site (http://software-carpentry.org/) at the end of the course.
What will students learn?
By the end of the course, students will be able to map short-read data to sequenced genomes and query the mapping for variation, transcript prevalence (from mRNAseq data), and enriched genomic regions (from ChIP-seq data). They will also know how to transfer large data sets between computers, run extended analyses while asleep, execute and modify existing Python scripts, and otherwise effectively make use of basic computational resources.
Location, dates, and course structure.
The course will be run at the W.K. Kellogg Biological Station on Gull Lake in western Michigan from 5pm on Monday, May 31st through noon on Friday, June 11th. Morning and afternoon lectures will be interspersed with practical hands-on labs. Room and board will be provided on-site (see enrolling, below). Sunday, June 6th, will be a day of rest & relaxation.
We hope to run this course again in 2011.
Applying for the course
An application is required for both MSU students and non-MSU students, and enrollment override will only be granted after application. We can accommodate approximately 25 students.
If you've applied, we will send you a confirmation on Friday, April 16th, and notify you by the 26th regarding your application.
Tuition, course cost, and enrolling
For students who do not need course credit, we are charging a workshop fee of $350 per student instead of tuition. This covers everything but travel, room and board. Payment details will be posted later.
All students must also pay for on-site room and board (up to $285/wk, approx $570 total, depending on room - see Orchard Dorms) upon arrival. Applicants must commit to spending both weeks on-site.
Taking the course for-credit
If you want credit for the course (either at MSU or as a Lifelong Education student) you must enroll in the course; contact us directly if you are interested. The course is listed as CSE 891 s431 / MMG 890 s433, 2 cr, at MSU.
If you are already paying summer tuition one of the CIC universities - Big 10 and U. Chicago - you should be able to take the course more cheaply; please check with your local graduate registrar.
No additional supplies or equipment are required; in particular, we will provide Mac OS X laptops for all students to use, although we encourage you to bring your own as well.
C. Titus Brown (http://ged.msu.edu/) holds a Ph.D. in Developmental Biology from Caltech, and has worked on digital evolution, physical meteorology, developmental biology, and genomics. He is currently an Assistant Professor in Computer Science and Engineering, and Microbiology and Molecular Genetics, at Michigan State University, where his lab works on developmental biology, genomics and metagenomics data analysis, and software tool development.
Greg Wilson (http://pyre.third-bit.com/blog/cv) holds a Ph.D. in Computer Science from the University of Edinburgh, and has worked on high-performance scientific computing, data visualization, and computer security. He is currently an Assistant Professor in Computer Science at the University of Toronto, where his primary interests are lightweight software engineering tools and education. Greg has served on the editorial boards of Doctor Dobb’s Journal and Computing in Science and Engineering; his most recent books are Data Crunching (Pragmatic, 2005), Beautiful Code (O’Reilly, 2007), and Practical Programming (Pragmatic, 2009).
Syllabus / daily topics
Week 1: basic computational tools and computational thinking
- UNIX and EC2
- running big jobs
- bash shell
- scripting/programming in Python
Week 2: next-gen sequencing data analysis and bioinformatics
- mapping short reads to an existing genome
- mapping/variation analysis
- mapping/RNAseq analysis
- mapping/ChIP-seq analysis
During the second week, we will encourage students with existing data for mapping/assembly to focus on their own data using the techniques we discuss. The goal is to be flexible and maximize the value of the course for each student.
Please contact Dr. Titus Brown at ude.usm|btc#ude.usm|btc with any questions or concerns.