A common form of extreme value modelling involves modelling excesses of a threshold by a generalised Pareto (GP) distribution. The GP model arises by considering the possible limiting distributions of excesses as the threshold increased. Selecting too low a threshold leads to bias from model mis-specification; raising the threshold increases the variance of estimators: a bias-variance trade-off. Many existing threshold selection methods do not address this trade-off directly, but rather aim to select the lowest threshold above which the GP model is judged to hold approximately. We use Bayesian cross-validation to address the trade-off by comparing thresholds based on predictive ability at extreme levels. Extremal inferences can be sensitive to the choice of a single threshold. We use Bayesian model averaging to combine inferences from many thresholds, thereby reducing sensitivity to the choice of a single threshold. The methodology is illustrated using significant wave height datasets from the North Sea and from the Gulf of Mexico.