### Computational Challenges in Biochemical Networks: Multiscale Modelling and Inverse Problems.

#### The University of Manchester, 30th and 31st August 2016.

**Organiser: Simon Cotter**

**The workshop will be held in the Frank Adams room in the Alan Turing building, Oxford Rd, Manchester M13 9PL. All participants who booked accommodation through the registration process are staying at the Holiday Inn, 2-4 Oxford Road, Manchester, M1 5QA , United Kingdom. Lunch will be provided on both days of the workshop. The workshop dinner will be held at a nearby Japanese restaurant, Samsi (http://www.samsi.co.uk). We look forward to seeing you all at the end of August.**

**Registration is now open!**Confirmed speakers at the event should register here.

All other delegates should register here.

**Please note that accommodation bookings have now closed. Therefore, anyone now registering for the event will need to arrange their own accommodation. **

Speakers at the event will be staying at the Holiday Inn Express on Oxford Road. The Pendulum Hotel on Sackville Street and the Ibis on Charles Street are also convenient locations for this event.

Mathematical modelling is playing an increasingly important role in biological research. Models can help with the understanding of processes from modelling interactions of individual molecules within a cell, through to species-level models of population interactions and behaviour. In this workshop, we focus on the modelling and inverse problems arising from the study of biochemical networks, leading to a better understanding of the processes which govern and regulate cell behaviour.

Biochemical networks can be incredibly complex, stochastic, and often highly multiscale systems. That is, there are some reactions within the system which are occurring many times on a time scale for which other reactions are unlikely to occur at all. Since the gold standard Stochastic Simulation Algorithm for simulating these systems simulates every single reaction in the system, this multiscale property can make the SSA computationally intractable. Multiscale methods allow us to approximate the dynamics of the system accurately, without the need for expensive simulation of all of the fast reactions within the system.

New technologies are allowing us to make increasingly accurate and detailed observations of the chemical reactions which are occurring in cells. This has led to another area of rapid growth, the inverse problem of characterising these networks from observations. The range of approaches is broad, and many of them are extremely computationally intensive, and this is an area of lush further potential.

Both of these sets of problems present many mathematical and computational challenges, and in this workshop, we aim to present a snapshot of the work which is currently being undertaken across several communities in this area, including applied mathematics, statistics, mathematical biology and systems biology. The work presented will be split between researchers who are working on methods to efficiently model, simulate and analyse multi scale networks, and researchers who are working on problems of characterising such networks from observations.

This workshop was made possible by EPSRC First Grant award EP/L023393/1.

**List of confirmed speakers:**

Ruth Baker, Oxford University, UK - Multi-level Monte Carlo methods for simulation and parameter sensitivity analysis of biochemical reaction networks.

Simon Cotter, University of Manchester, UK - Computational Challenges in Biochemical Networks: Multiscale Modelling and Inverse Problems.

Radek Erban, Oxford University, UK - From dynamics to reaction networks (an inverse problem) and from molecules to whole-cell modelling (a multiscale challenge).

Ramon Grima, Edinburgh University, UK - Taking into account volume exclusion in spatial stochastic models of biochemical systems.

Michal Komorowski, Polish Academy of Sciences, Poland - Computational methods to understand information flow in biochemical networks.

Linda Petzold, UCSB, US - Stochastic Simulation at Your Service.

Alexis Boukouvalas, University of Manchester, UK - Identification of branching using pseudotime estimation.

Chris Sherlock, Lancaster University, UK - Delayed acceptance [particle] MCMC for inference on reaction networks.

Michael Stumpf, Imperial College, UK - Hypotheses-generation and network inference from a single-cell data using multivariate information measures.

Darren Wilkinson, Newcastle University, UK - Scalable algorithms for Markov process parameter inference.

Kostas Zygalakis, Edinburgh University, UK - Hybrid modelling of stochastic chemical kinetics.

**Programme**

**Tuesday 30 August**10:30 - Coffee and registration

10:55 - Welcome and introduction

11:00 - Talk 1: Linda Petzold

11:45 - Talk 2: Michal Komorowski

12:30 - Lunch break

13:30 - Talk 3: Ramon Grima

14:15 - Talk4: Michael Stumpf

15:00 - Coffee break

15:30 - Talk 5: Kostas Zygalakis

16:15 - Talk 6: Darren Wilkinson

18:30 - Conference dinner (venue tbc)

**Wednesday 31 August**

08:30 - Coffee

09:00 - Talk 7: Ruth Baker

09:45 - Talk 8: Alex Boukouvalas

10:30 - Coffee break

11:00 - Talk 9: Radek Erban

11:45 - Talk 10: Chris Sherlock

12:30 - Lunch break

13:30 - Talk 11: Simon Cotter

14:15 - Wrap up

**Talk abstracts:**

**Ruth Baker** - Computationally efficient inference methods for biochemical reaction networks.

Discrete-state, continuous-time Markov models are widely used to model biochemical reaction networks. Their complexity generally means that we rely on Monte Carlo simulation to estimate summary statistics of interest. The multi-level Monte Carlo method employs a variance reduction approach that involves generating sample paths with different accuracies in order to estimate these summary statistics. A base estimator is computed using many (cheap) paths at low accuracy. The bias inherent in this estimator is then reduced using a number of correction estimators. Each correction term is estimated using a collection of (increasingly expensive) paired sample paths where one path of each pair is generated at a higher accuracy compared to the other. By sharing randomness between these paired sample paths a relatively small number of paired paths are required to calculate each correction term. In this talk I will describe some variants of the multi-level Monte Carlo method, and show how they can be used efficiently to estimate distributions of species numbers and undertake parameter sensitivity analysis.

**Simon Cotter - **Computational Challenges in Biochemical Networks: Multiscale Modelling and Inverse Problems.

The mathematical simulation and analysis of complex biochemical reaction networks presents significant challenges across a range of mathematical disciplines. The dynamical systems in question are often highly complex, being nonlinear, stochastic, multiscale and often spatially inhomogeneous. Recent advances allow us to make increasingly high resolution observations of the chemical processes which regulate key processes in living cells. The act of combining these large noisy datasets with complex models in order to make accurate inferences on the structure and reaction rates within these networks is also a significant challenge. In this talk I will summarise the areas that I have been working on, namely multiscale modelling of well-mixed biochemical stochastic networks, and the development of a scalable inferential algorithm, the Parallel Adaptive Importance Sampler (PAIS).

**Radek Erban** - From dynamics to reaction networks (an inverse problem) and from molecules to whole-cell modelling (a multiscale challenge).

I will present our recent work on two problems related to two themes of this research workshop.

In the first part of my talk, I will discuss an inverse problem of constructing mass-action chemical reaction networks with given dynamical behaviour. Such constructions are useful in many application areas. In synthetic biology, such constructed systems may be used as a blueprint for engineering artificial networks. They also add to the set of test problems for numerical methods designed for network inference from data. I will discuss constructions of chemical reaction networks with both (i) prescribed deterministic behaviour (bifurcation structure) and (ii) prescribed stochastic behaviour (including state-dependent fluctuations). This is joint work with D. Anderson, T. Plesa, T. Vejchodsky and K. Zygalakis.

In the second part of my talk, I will discuss methods for spatio-temporal modelling in molecular and cell biology, including all-atom and coarse-grained molecular dynamics (MD) and stoshastic reaction-diffusion models, with the aim of developing and analysing multiscale methods which use MD simulations in parts of the computational domain and (less-detailed) stochastic reaction-diffusion approaches in the remainder of the domain. The main goal of this multiscale methodology is to use a detailed modelling approach in localized regions of particular interest (in which accuracy and microscopic details are important) and a less detailed model in other regions in which accuracy may be traded for simulation efficiency [2,3].

[1] T. Plesa, T. Vejchodsky and R. Erban, "Chemical reaction systems with a homoclinic bifurcation: an inverse problem", Journal of Mathematical Chemistry, doi:10.1007/s10910-016-0656-1 (2016)

[2] R. Erban, "Coupling all-atom molecular dynamics simulations of ions in water with Brownian dynamics", Proceedings of the Royal Society A, Volume 472, Number 2186, 20150556 (2016)

[3] R. Erban, "From molecular dynamics to Brownian dynamics", Proceedings of the Royal Society A, Volume 470, Number 2167, 20140036 (2014)

**Ramon Grima** - Taking into account volume exclusion in spatial stochastic models of biochemical systems.

Spatial stochastic effects in chemical reaction systems have been mostly studied via the reaction-diffusion chemical master equation (RDME), a spatial discrete stochastic formulationof chemical kinetics which assumes well-mixing on small local scales and point-like interactions between molecules. However, it is well known that the intracellular environment is highly non-dilute implying that volume exclusion effects are important. I will here describe our recent work on modifying the RDME using scaled particle theory to describe volume exclusion effects on the diffusion of molecules in homogenous and heterogeneous media. I will show that stochastic simulations using this new RDME and a mean-field theory based on it, are in excellent agreement with the considerably more computationally expensive technique of Brownian dynamics. The theory also leads to closed-form expressions relating the size of particles and their abundance to their diffusion and advection properties.

**Michal Komorowski** - Computational methods to understand information flow in biochemical networks.

Highthroughput experimental approaches and the advances of computational capabilities continue to reshape modern biology. Thus, mathematical biology offers new possibilities for mathematicians and biologists to explore signal transduction, an important and highly complex aspect of cell biology. The molecular mechanisms how cells transduce biochemical signals are widely understood. Biochemical descriptions however failed to reveal how stimuli are translated into distinct responses. Signalling pathways are highly complex as they are functionally pleitropic and biochemical reactions intrinsically stochastic. In the talk I will present how mathematical methods of information theory and probabilistic dynamical modelling can contribute to understanding of how information flows in signalling pathways.

**Linda Petzold** - Stochastic Simulation at Your Service.

Stochasticity plays an important role in many biological processes. At the same time, algorithms and software for discrete stochastic simulation have advanced to the point where not only simulation of well-mixed systems, but spatial stochastic simulation on 3D complex geometries, parameter estimation for stochastic systems, and efficient computation of the probability of rare events, can be performed with accuracy and reliability. A few years ago we embarked on a quest to build a unified software environment to enable biologists to easily harness the power of these tools. we envisioned that users might build an ODE model or discrete stochastic model on a laptop, and scale it up to increasing levels of complexity, accessing tools such as those mentioned above, and deploying computing resources from the cloud with the push of a button when they are needed. SochSS: Stochastic Simulation as-a-Service, is available for download at www.stochss.org. As the capabilities of StochSS have grown, so has our vision of the roles that computing can play in the advancement of science.

**Alexis Boukouvalas** - Identification of branching using pseudotime estimation.

Capturing cells at different stages of differentiation, we perform a pseudotime reconstruction which allows an unsychronized cell population to be placed on a developmental continuum. Recent work has shown how pseudotime can be inferred from single-cell data with capture time information using the GPLVM probabilistic model (Reid and Wernisch, 2016). We extend this model to infer pseudotime across a branching process, building on recent work on time course expression data (Yang et al. 2016). We use this approach to identify the earliest developmental point where cell fate decisions are evident and rank genes in terms of earliest divergence. By focussing on the times at which different genes diverge we are able to identify the earliest pathways and processes driving differentiation.**Chris Sherlock** - Delayed acceptance [particle] MCMC for inference on reaction networks.

When conducting Bayesian inference, delayed acceptance (DA) Metropolis-Hastings (MH) algorithms and DA pseudo-marginal MH algorithms can be applied when it is computationally expensive to calculate the true posterior or an unbiased estimate thereof, but a computationally-cheap approximation is available. A first accept-reject stage is applied, with the cheap approximation substituted for the true posterior in the MH acceptance ratio. Only for those proposals which pass through the first stage is the computationally expensive true posterior (or unbiased estimate thereof) evaluated, with a second accept-reject stage ensuring that detailed balance is satisfied with respect to the intended true posterior. For some reaction networks the Linear Noise Approximation provides a cheap and relatively accurate approximation whereas in other scenarios there is no obvious computationally-cheap surrogate. In such cases a weighted average of previous evaluations of the computationally expensive posterior provides a generic approximation. I will discuss inference for reaction networks using delayed acceptance MCMC in each of the above scenarios and, if time, overview some theoretical investigations into the delayed acceptance random walk Metropolis that provide heuristics for tuning the algorithms.

**Michael Stumpf** - Hypotheses-generation and network inference from single-cell data using multivariate information measures.

Precisely controlled patterns of gene expression are essential for the survival and reproduction of all life-forms. Intricate networks of transcriptional activators and repressors have evolved to regulate the spatial and temporal expression of genes, enabling organisms to adjust transcription levels in response to environmental, developmental and physiological cues. Elucidating the structure of such gene-regulatory networks (GRNs) has been a central goal of much recent systems biology research and continues to constitute a major statistical and computational challenge. Here I will discuss how multivariate information theoretical measures can be used to study GRNs underlying different developmental processes in health and disease.

**Darren Wilkinson** - Scalable algorithms for Markov process parameter inference.

Inferring the parameters of continuous-time Markov process models using partial discrete-time observations is an important practical problem in many fields of scientific research. Such models are very often "intractable", in the sense that the transition kernel of the process cannot be described in closed form, and is difficult to approximate well. Nevertheless, it is often possible to forward simulate realisations of trajectories of the process using stochastic simulation. There have been a number of recent developments in the literature relevant to the parameter estimation problem, involving a mixture of approximate, sequential and Markov chain Monte Carlo methods. This talk will compare some of the different "likelihood free" algorithms that have been proposed, including sequential ABC and particle marginal Metropolis Hastings, paying particular attention to how well they scale with model complexity. Emphasis will be placed on the problem of Bayesian parameter inference for the rate constants of stochastic biochemical network models, using noisy, partial high-resolution time course data.

**Kostas Zygalakis** - Hybrid modelling of stochastic chemical kinetics.

It is well known that stochasticity can play a fundamental role in various biochemical processes, such as cell regulatory networks and enzyme cascades. Isothermal, well-mixed systems can be adequately modelled by Markov processes and, for such systems, methods such as Gillespie's algorithm are typically employed. While such schemes are easy to implement and are exact, the computational cost of simulating such systems can become prohibitive as the frequency of the reaction events increase. This has motivated numerous coarse grained schemes, where the 'fast' reactions are approximated either using Langevin dynamics or deterministically. While such approaches provide a good approximation for systems where all reactants are present in large concentrations, the approximation breaks down when the fast chemical species exist in small concentrations, giving rise to significant errors in the simulation. This is particularly problematic when using such methods to compute statistics of extinction times for chemical species, as well as computing observables of cell cycle models. In this talk, we present a hybrid scheme for simulating well-mixed stochastic kinetics, using Gillespie-type dynamics to simulate the network in regions of low reactant concentration, and chemical Langevin dynamics when the concentrations of all species is large. These two regimes are coupled via an intermediate region in which a 'blended' jump-diffusion model is introduced. Examples of gene regulatory networks involving reactions occuring at multiple scales, as well as a cell-cycle model are simulated, using the exact and hybrid scheme, and compared, both in terms weak error, as well as computational cost.