STA 6973 Course Description
Special Problems in Statistics: Statistical Bioinformatics

Objectives: The purpose of this course is to introduce students to bioinformatics - an interdisciplinary area of study that combines techniques and knowledge in mathematical, statistical, computational, and life sciences to understand the biological significance of genetic sequence data. With hands on projects, students will learn to use programs from the Wisconsin package of molecular sequence analysis software and other Bioinformatics resources on the Internet. Lectures and class discussion will emphasize on the statistical models and methods underlying the algorithm design of these computational tools.

Instructor:    Dr. Ming-Ying Leung
                        Office: SB 4.01.22, Division of Mathematics and StatisticsUTSA.
                        Phone: (210) 458-5535; Fax: (210) 458-4439
                        Email: mleung@utsa.edu
                         http://www.math.utsa.edu/~leung/

Office Hours:    MW 6:00 - 6:45 pm

Syllabus: We shall cover topics 1 - 5 in the first half of the semester. Depending on available time and students? interest, topics from 6 - 10 will be picked for in-depth discussion.

  1. What is Bioinformatics? Historical development, the role of mathematics and statistics, current challenges and future outlook.
  2. Introduction to Computational Resources for Bioinformatics. The UNIX system, World Wide Web, sequence and structure databases, bioinformatics and statistics software packages.
  3. A Primer in Molecular Genetics. Definition of a gene, gene expression, protein structure and function, conserved sequence motifs, introduction to genome projects.
  4. Probability and Statistics. Probability, statistical inference in a Bayesian framework, random sequence models, Markov chains and Hidden Markov models.
  5. Sequence Alignment and Database Search. Alignment methods for two sequences, database similarity search using BLAST and FASTA, multiple sequence alignment.
  6. Genetic Engineering Analyses of Living Systems. Sequencing techniques, large scale sequencing projects, physical genome maps and clone libraries, sequence assembly problem.
  7. Prediction of Functional Units on Genomes. Prediction of coding regions and regulatory sites,
  8. Evolution. Phylogenetic models, sequence alignment, tree building methods, tree evaluation by bootstrap analysis.
  9. Protein Structure Prediction. Molecular organization of protein molecules, from Chou and Fasman to neural networks, secondary structure prediction, tertiary structure prediction.
  10. DNA Sequence Patterns. Repeating elements, over- and under-representation of short oligonucleotides, palindrome clusters, detecting periodicity in DNA.
Textbook: No textbook will be required. Course materials will be mainly from:
  1. Introduction to Computational Biology - Maps, Sequences and Genomes, by Michael Waterman (1995).
  2. Bioinformatics: A practical guide to the analysis of genes and proteins, edited by A.D. Baxevanis and B.F. Francis Quellette (1998).
  3. Bioinformatics: The machine learning approach, by Pierre Baldi and Soren Brunak (1998)

Grading:        Exercises                            40%    (Due in class almost every Monday)
                         In class test                        20%    (Wednesday, 3/8)
                         In class presentation       10%    (4/19, 4/24, 4/26)
                         Final Project                      30%     (Due Wednesday 5/10 by 10:45 pm)