ATD Collaborative Research: Statistical Ensembles for the Identification
of Bacterial Genomes
PI: Adrian Dobra
Sponsor: ATD Collaborative Research: Statistical Ensembles for the Identification of Bacterial Genomes
Project Period: -
As defined by the Center for Disease Control and Prevention, a bioterrorism attack is the deliberate release of viruses, bacteria, or other germs used to cause illness or death in people, animals, or plants. The use of micro-organisms to cause disease is a growing concern for public health officials and national defense agencies, in light of the terrorist attacks of September 11, 2001, and the subsequent releases of anthrax to individuals in congress and the media. There exists biological agents that, if used effectively as biological weapons, could cause substantial public health challenge in terms of our ability to limit the damage to both our citizens and our nations. One of the scientific initiatives to reduce the threat of bioterrorism is the development of mathematical and statistical methods for the rapid identification of genome differences and the accurate classification of bacterial genomes as harmless or potentially pathogenic. The main objective of this proposal is the development of high dimensional classification and clustering tools for this purpose. We consider three statistical approaches to the identification of bacterial genomes in a given bacterial "soup": (1) classification by overlap enrichment; (2) comparison of empirical clusterings and consensus genomes; and (3) shrinkage estimation and model selection in hierararchial log- linear models.