CSE891: Algorithms and Probabilistic Models in Computational Biology
Credits: 3 (count towards the "theory" area of the "breadth requirement")
Instructor: Yanni Sun (ude.usm|nusinnay#ude.usm|nusinnay)
Tentative schedule: M W 12:40 pm - 2:00 pm
Biological sequence analysis: probabilistic models of proteins and nucleic acids by Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison
This course focuses on commonly used probabilistic models and algorithms in computational biology. Both canonical methods and their potential applications to new problems in computational biology will be covered. Fox example, we will have in-depth covering of various sequence alignment algorithms and also explore how to adapt them for short-sequence mapping problem, which is often needed to analyze high-throughput sequencing data sets.
Topics covered include sequence alignment algorithms, motif finding (e.g. Gibbs sampling algorithm), protein sequence functional analysis HMM, profile HMM, position specific weight matrix, pHMM-pHMM alignment), noncoding RNA identification (stochastic context-free grammar), and some commonly used algorithms such as expectation-maximization algorithms, conditional random fields, etc.
This course can help you achieve following goals: (1) understand active research problems in computational biology; (2) become more efficient when reading research papers in computational biology; (3) obtain in-depth understanding of important computational methods and probabilistic models in computational biology or related fields. In particular, for students with biology backgrounds, this course will provide you an opportunity to look into the "black box" of many popular bioinformatics tools.
Prior programming experience is expected for this course. Any programming language is fine, including C/C++, Matlab, Java, Python, perl, etc.
This course contains about 6 homework assignments and a final project. Each homework assignment requires students to solve a well-defined computational biology problem using real biological data. In addition, students are expected to write critics for related research papers. The final project will be a group project; collaborations between students with different backgrounds are encouraged.
—- Comparative sequence analysis and its applications
Pairwise sequence alignment: the scoring model, alignment algorithms, BLAST and heuristic alignment algorithms, E-value computational and significance of alignment scores
Multiple sequence alignment
Short-sequence mapping algorithm
—- Motif finding algorithms
—- Protein sequence classification and functional analysis
- Hidden Markov models
- Profile HMM and the associated algorithms such as the Viterbi algorithm, forward algorithm, EM algorithm and Baum-Welch algorithm, Dirichlet mixtures
—- Protein-coding gene finding
- Gene finding and pairwise HMM
- Conditional random fields
—- Phylogenetic inference
—- RNA sequence analysis
- RNA secondary structure
- Stochastic context-free grammar
- Noncoding RNA gene finding