STA 6973 Course Description
Special Problems in Statistics: Statistical Bioinformatics
Objectives: The purpose of this course is
to introduce students to bioinformatics - an interdisciplinary area of
study that combines techniques and knowledge in mathematical, statistical,
computational, and life sciences to understand the biological significance
of genetic sequence data. With hands on projects, students will learn to
use programs from the Wisconsin package of molecular sequence analysis
software and other Bioinformatics resources on the Internet. Lectures and
class discussion will emphasize on the statistical models and methods underlying
the algorithm design of these computational tools.
Instructor:
Dr. Ming-Ying Leung
Office: SB 4.01.22, Division of Mathematics
and Statistics, UTSA.
Phone: (210) 458-5535; Fax: (210) 458-4439
Email: mleung@utsa.edu
http://www.math.utsa.edu/~leung/
Office Hours: MW 6:00 - 6:45 pm
Syllabus: We shall cover topics 1 - 5 in the first half of the
semester. Depending on available time and students? interest, topics from
6 - 10 will be picked for in-depth discussion.
-
What is Bioinformatics? Historical development, the role of mathematics
and statistics, current challenges and future outlook.
-
Introduction to Computational Resources for Bioinformatics. The
UNIX system, World Wide Web, sequence and structure databases, bioinformatics
and statistics software packages.
-
A Primer in Molecular Genetics. Definition of a gene, gene expression,
protein structure and function, conserved sequence motifs, introduction
to genome projects.
-
Probability and Statistics. Probability, statistical inference in
a Bayesian framework, random sequence models, Markov chains and Hidden
Markov models.
-
Sequence Alignment and Database Search. Alignment methods for two
sequences, database similarity search using BLAST and FASTA, multiple sequence
alignment.
-
Genetic Engineering Analyses of Living Systems. Sequencing techniques,
large scale sequencing projects, physical genome maps and clone libraries,
sequence assembly problem.
-
Prediction of Functional Units on Genomes. Prediction of coding
regions and regulatory sites,
-
Evolution. Phylogenetic models, sequence alignment, tree building
methods, tree evaluation by bootstrap analysis.
-
Protein Structure Prediction. Molecular organization of protein
molecules, from Chou and Fasman to neural networks, secondary structure
prediction, tertiary structure prediction.
-
DNA Sequence Patterns. Repeating elements, over- and under-representation
of short oligonucleotides, palindrome clusters, detecting periodicity in
DNA.
Textbook: No textbook will be required. Course materials will be
mainly from:
-
Introduction
to Computational Biology - Maps, Sequences and Genomes, by Michael Waterman
(1995).
-
Bioinformatics:
A practical guide to the analysis of genes and proteins, edited by A.D.
Baxevanis and B.F. Francis Quellette (1998).
-
Bioinformatics:
The machine learning approach, by Pierre Baldi and Soren Brunak (1998)
Grading: Exercises
40% (Due in class almost every Monday)
In class test
20% (Wednesday, 3/8)
In class presentation 10%
(4/19, 4/24, 4/26)
Final Project
30% (Due Wednesday 5/10 by 10:45 pm)