Entropy learning in statistics and a new look at omics data integration

Korbinian Strimmer (Imperial College London)

Simon Building - Theatre D,

Entropy learning is a general framework for statistical learning that is well established in machine learning but still less appreciated in the statistics community.  In the first part of my talk I will provide a brief overview over entropy learning and outline the strong (and sometimes surprising) connections with statistical procedures such as maximum likelihood, Bayesian learning and general information updating, and its application in model selection.

With entropy learning in mind I will then discuss the problem of joint integrative analysis of omics data.  Probably the most commonly used approach is classical canonical correlation analysis (CCA), or a modern variant of it such as sparse CCA or Bayesian CCA.  Other popular approaches for data integration include PLS and O2PLS, which are related projection-based methods developed in chemometrics, and the RV coefficient measuring the total association between groups of variables (genes/metabolites/etc).  Unfortunately, all these approaches have, despite their widespread use in biomedical data analysis, a number of crucial drawbacks, including lack of interpretability of the underlying factors, incoherency with standard multivariate regression and difficulties in application to large-scale data sets.

To overcome these challenges I present a simple network-based approach to integrative data analysis that employs relative entropy to characterize the overall association between two (or more) sets of omics data.  This approach is natural in the setting of latent-variable multivariate regression where we show that it enables a canonical decomposition which in turn allows to infer the underlying corresponding association network among the individual constituents.  Furthermore, our approach to data integration is computationally inexpensive and hence can be applied to large-dimensional data sets.  It can also be easily extended to more than two data sets.  We illustrate this approach, which can be interpreted as networked extension of CCA, by analysing metabolomic and transcriptomic data.

At the end of the talk I briefly discuss further applications of entropy learning to help guide the development of effective methodology for statistical learning of complex parameter rich models (such as Bayesian nonparametric and neural network models).

Import this event to your Outlook calendar
▲ Up to the top