Show simple item record

dc.contributor.advisorQuackenbush, John
dc.contributor.authorSchlauch, Daniel
dc.date.accessioned2019-05-20T10:22:50Z
dc.date.created2017-05
dc.date.issued2017-05-04
dc.date.submitted2017
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:40046439*
dc.description.abstractThe explosion of data arising from advances in high throughput sequencing has allowed scientists to study genomics in far greater detail. However, this high resolution picture of cells often makes it difficult to see the higher level functions and features in the biology that lead to phenotypic outcomes. Identifying the structure hidden in genomic data is critical to separating the data patterns that we consider artifactual, such as batch effect or population structure, from that which we consider signal. In chapter 2, we address a problem in estimating genetic similarity more accurately, which is important for inferring population structure in a sample. We exploit the relative informativeness of rare variants to more precisely inform our measurement. We then show that this precision can be used to easily test assumptions of homogeneity and identify cryptically related individuals. In chapter 3, we propose a method in transcriptomics that similarly identifies and controls for unwanted latent structure. Batch effect has been widely described in the literature, but we specifically consider the impact of batch on coexpression, a concept critical to gene network inference. Our method involves a regression approach for controlling for this effect by estimating a reduced number of parameters that describe the coexpression matrix as a function of the covariates. Finally, in chapter 4, we demonstrate an approach for finding transcription factor drivers of cell state transitions using gene regulatory network (GRN) models. The best way to characterize the rewiring that occurs in GRNs between phenotypic states is unclear, and gold-standards are nearly non-existent. We propose an approach that estimates a matrix describing the change in network adjacency matrix between two states and demonstrate it by applying it to four separate studies of COPD. Together, these chapters present three contributions to our understanding of genomic data. Fundamentally, each method described here estimates specific types of hidden underlying structure in complex, high dimensional settings. In each context, estimating this structure allows us to better understand how genomic features leads to phenotype.
dc.description.sponsorshipBiostatistics
dc.format.mimetypeapplication/pdf
dc.language.isoen
dash.licenseLAA
dc.subjectBiology, Biostatistics
dc.subjectStatistics
dc.subjectBiology, Bioinformatics
dc.titleMethods for Estimating Hidden Structure and Network Transitions in Genomics
dc.typeThesis or Dissertation
dash.depositing.authorSchlauch, Daniel
dc.date.available2019-05-20T10:22:50Z
thesis.degree.date2017
thesis.degree.grantorGraduate School of Arts & Sciences
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
dc.contributor.committeeMemberLange, Christoph
dc.contributor.committeeMemberGlass, Kimberly
dc.type.materialtext
thesis.degree.departmentBiostatistics
dash.identifier.vireohttp://etds.lib.harvard.edu/gsas/admin/view/1467
dc.description.keywordsGenomics; Statistical Genetics; Gene Regulatory Network Inference; High-dimensional data; Batch effect
dash.author.emaildschlauch@gmail.com


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record