Methods for Estimating Hidden Structure and Network Transitions in Genomics

Schlauch, Daniel

dc.contributor.advisor	Quackenbush, John
dc.contributor.author	Schlauch, Daniel
dc.date.accessioned	2019-05-20T10:22:50Z
dc.date.created	2017-05
dc.date.issued	2017-05-04
dc.date.submitted	2017
dc.identifier.uri	http://nrs.harvard.edu/urn-3:HUL.InstRepos:40046439	*
dc.description.abstract	The explosion of data arising from advances in high throughput sequencing has allowed scientists to study genomics in far greater detail. However, this high resolution picture of cells often makes it difficult to see the higher level functions and features in the biology that lead to phenotypic outcomes. Identifying the structure hidden in genomic data is critical to separating the data patterns that we consider artifactual, such as batch effect or population structure, from that which we consider signal. In chapter 2, we address a problem in estimating genetic similarity more accurately, which is important for inferring population structure in a sample. We exploit the relative informativeness of rare variants to more precisely inform our measurement. We then show that this precision can be used to easily test assumptions of homogeneity and identify cryptically related individuals. In chapter 3, we propose a method in transcriptomics that similarly identifies and controls for unwanted latent structure. Batch effect has been widely described in the literature, but we specifically consider the impact of batch on coexpression, a concept critical to gene network inference. Our method involves a regression approach for controlling for this effect by estimating a reduced number of parameters that describe the coexpression matrix as a function of the covariates. Finally, in chapter 4, we demonstrate an approach for finding transcription factor drivers of cell state transitions using gene regulatory network (GRN) models. The best way to characterize the rewiring that occurs in GRNs between phenotypic states is unclear, and gold-standards are nearly non-existent. We propose an approach that estimates a matrix describing the change in network adjacency matrix between two states and demonstrate it by applying it to four separate studies of COPD. Together, these chapters present three contributions to our understanding of genomic data. Fundamentally, each method described here estimates specific types of hidden underlying structure in complex, high dimensional settings. In each context, estimating this structure allows us to better understand how genomic features leads to phenotype.
dc.description.sponsorship	Biostatistics
dc.format.mimetype	application/pdf
dc.language.iso	en
dash.license	LAA
dc.subject	Biology, Biostatistics
dc.subject	Statistics
dc.subject	Biology, Bioinformatics
dc.title	Methods for Estimating Hidden Structure and Network Transitions in Genomics
dc.type	Thesis or Dissertation
dash.depositing.author	Schlauch, Daniel
dc.date.available	2019-05-20T10:22:50Z
thesis.degree.date	2017
thesis.degree.grantor	Graduate School of Arts & Sciences
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy
dc.contributor.committeeMember	Lange, Christoph
dc.contributor.committeeMember	Glass, Kimberly
dc.type.material	text
thesis.degree.department	Biostatistics
dash.identifier.vireo	http://etds.lib.harvard.edu/gsas/admin/view/1467
dc.description.keywords	Genomics; Statistical Genetics; Gene Regulatory Network Inference; High-dimensional data; Batch effect
dash.author.email	dschlauch@gmail.com

Files in this item

Name:: SCHLAUCH-DISSERTATION-2017.pdf
Size:: 12.71Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

FAS Theses and Dissertations [6138]

Show simple item record