Complex computer models, also called simulators, require large resources in time and memory to produce realistic results. Statistical emulators are computationally cheap approximations of such simulators. They can be built to replace simulators for various purposes, such as the propagation of uncertainties from inputs to outputs or the calibration of some internal parameters against observations. However, when the input space is of high dimension, the construction of an emulator can become prohibitively expensive.
In the first part, we introduce a joint framework merging emulation with dimension reduction in order to overcome this hurdle. The gradient-based kernel dimension reduction technique is chosen due to its ability to extract drastically lower dimensions with little loss in information. The Gaussian Process (GP) technique is combined with this dimension reduction approach. Theoretical properties of the approximation are explored. Our proposed approach provides an answer to the dimension reduction issue for a wide range of problems that cannot be tackled using existing methods at the moment. The efficiency and accuracy of the proposed framework is demonstrated and compared with other methods on an elliptic partial differential equation problem. An application to tsunami modelling, using the simulator VOLNA is presented. The uncertainties in the bathymetry (seafloor elevation) are modeled as high-dimensional realizations of a spatial process. Our dimension-reduced emulation enables us to compute the impact of these uncertainties on resulting possible tsunami wave heights near-shore and on-shore. Considering an uncertain earthquake source, we observe a significant increase in the spread of uncertainties in the tsunami heights due to the contribution of the bathymetry uncertainties to the overall uncertainty budget. These results highlight the need to reduce uncertainties in the bathymetry in early warnings and hazard assessments.
In the second part, we present a method for the design and analysis of computer experiments. GPs are trained on input-output data obtained from simulation runs at various input values. We present a sequential design algorithm, MICE (Mutual Information for Computer Experiments), that adaptively selects the input values at which to run the computer simulator, in order to maximize the expected information gain (mutual information) over the input space. The superior computational efficiency of MICE compared to other algorithms is demonstrated on test functions and VOLNA with overall gains of 20% in that case. Moreover, there is a clear computational advantage in building a design of computer experiments solely on a subset of active variables. However, this prior selection inflates the limited computational budget. We thus interweave MICE with a screening algorithm to improve the overall efficiency of building an emulator. This approach allows us to assess future tsunami risk for complex earthquake sources over Cascadia.