Three Aspects of Gene Expression: Pathway Coexpression, Cross-Species Analysis of RNA-seq Data, and Bias in Gene Coexpression
Abstract
In this dissertation, we propose methods for gene expression focused on three problems in functional genomics: describing relationships between biological pathways, comparing tissues from different species, and accounting for biases in gene coexpression.In chapter 1, we present a pathway coexpression network that systematically quantifies and establishes a reference for high-level relationships between pathways. The method uses 3,207 microarrays from 72 normal human tissues and 1,330 of the most well established pathway annotations to describe global relationships between pathways. The pathway coexpression network accounts for shared genes to estimate correlations between pathway with related functions rather than with redundant annotations.
In chapter 2, we propose a method to adjust RNA-seq expression estimates from human and mouse tissues for differences between the genomic annotations. Previous studies using gene expression data to compare homologous genes across different species concluded that gene expression was more similar between homologous tissues of different species than between different tissues from the same species. Recently, the Mouse ENCODE consortium reached the opposite conclusion reporting that gene expression data from humans and mice samples cluster by species rather than by tissue. We showed that these results were driven by differences between species annotation. Our method uses ortholog probes, genomic regions within human-mouse orthologs with the same length and almost identical sequences, to quantify gene expression data. The ortholog probes showed that the human and mouse samples cluster by tissue rather than by species.
In chapter 3, we used a linear model framework to estimates the correlation between genes taking into account the experimental factors from gene expression data sets. The correlation based on gene expression data has been a popular choice to describe relationships between genes. However, interpreting these correlation estimates is challenging since they can arise from biological as well as non-biological sources. We used a linear mixed model to quantify the influence of the variation within experimental factors on the observed correlation, and a linear model to estimate the correlation between the gene-specific effects of the experimental factors.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:40050051
Collections
- FAS Theses and Dissertations [6138]
Contact administrator regarding this item (to report mistakes or request changes)