Prediction and Network Construction using High-throughput Data
PI: Adrian E Raftery
Sponsor: Prediction and Network Construction using High-throughput Data
Project Period:
-
Amount: $615,507.00
Abstract
Gene expression microarray data are used extensively to classify tissues into types, including various types of tumor, and to predict survival, time to relapse, and other temporal quantities. Microarrays measure the expression levels of thousands of genes, all of which are potential predictors. This poses difficult statistical problems since the number of genes is far larger than the number of tissue samples typically available. We propose to develop Bayesian model averaging (BMA) methods to deal with this problem, and produce simple, reliable, robust and interpretable predictions of the presence or type of tumor, and probability of and time to relapse. This also provides a probabilistic gene selection method. We also propose to investigate using properties from expression networks (e.g. highly connected hub genes) to identify biologically meaningful predictive genes. We will also apply and extend the BMA methods to determine predictive network modules and known gene categories (e.g. GO categories, KEGG pathways). As part of this we will develop and extend recent methods for social networks, the latent position cluster model, to infer expression networks and to identify gene modules. The main thrusts of the research will be: (1) BMA for multi-class classification and survival analysis using gene expression data; (2) latent position cluster model for inferring expression networks and identifying network modules; (3) prediction using network modules and gene categories; (4) generation of expression perturbation data to test our network construction methods in yeast; and (5) production and distribution of software tools.