The genetic analysis of correlated high-dimensional traits is hampered by a large multiple testing burden. In this talk, I will discuss different computational strategies to fully exploit large high-dimensional datasets, thereby testing for genetic effect effects that are shared across traits or are specific to some. In the first part of the talk I will outline how computationally efficient multi-variate linear mixed models can be used to identify genetic linkages between genomic regions and multiple correlated traits1. This approach scales to very large cohorts with up to 500 thousand individuals and up to tens of traits, while simultaneously correcting for population structure and non-genetic sources of trait correlations.
In the second part of the talk I will discuss approaches to map the determinants of gene expression levels and other molecular intermediates. These large-scale expression datasets are often compromised by hidden structure between samples. In the context of genetic association studies, this structure can be linked to differences between individuals, which can reflect their genetic makeup (such as population structure) or be traced back to environmental and technical factors. I will discuss statistical methods to reconstruct this structure from the observed data to account for it in genetic analyses2,3. By incorporating principles from causal reasoning, we show how these methods can be extended to circumvent critical pitfalls of falsely explaining away true biological signals.
1: Casale, Francesco Paolo, et al. "Efficient set tests for the genetic analysis of correlated traits." Nature methods (2015).
2: Fusi, Nicoló, Oliver Stegle, and Neil D. Lawrence. "Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies." PLoS Comput Biol 8.1 (2012): e1002330.
3: Stegle, Oliver, et al. "A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies." PLoS Comput Biol 6.5 (2010): e1000770.