# Scalable inference; statistical, algorithmic, computational aspects

In July I attended a month-long programme at the Isaac Newton Institute, Cambridge, organised by the *i-like* project: *“Scalable inference; statistical, algorithmic, computational aspects.”* Videos of some selected talks are now available online, so I thought I would highlight some that in my opinion are particularly worth watching.

### IS type estimators based on approximate marginal MCMC

Matti Vihola (Jyväskylä)

Matti presented an importance sampling (IS) correction for MCMC chains that do not target the exact posterior distribution (arXiv 1609.02541). This has advantages over other “exact-approximate” methods, such as delayed acceptance (DA), because the IS correction can be performed offline, in parallel. This has great potential to be combined with Bayesian indirect likelihood (BIL) methods, such as my surrogate model for ABC or Gaussian process approximation for pseudo-marginal MCMC.

### Exact Bayesian Inference for Big Data: Single- and Multi-Core Approaches

Murray Pollock (Warwick)

Murray presented some algorithms for combining inference from multiple, distributed sub-posteriors. The essential idea is to run a coupled Markov chain for each sub-posterior. When the chains coalesce, this gives an unbiased sample from the combined posterior distribution. These parallel methods are related to the SCALE algorithm (arXiv 1609.03436).

### Coresets for scalable Bayesian logistic regression

Tamara Broderick (MIT)

Tamara’s talk combined ideas from Comp. Sci. for dimension reduction with statistical algorithms for Bayesian inference. A *coreset* is a weighted subsample of the data, which is intended to provide a low-dimensional representation while minimising information loss. This can provide superior results over naïve subsampling methods, such as stochastic gradient descent (SGD) or stochastic gradient Langevin dynamics (SGLD). More details are available in her NIPS 2016 paper and the article homepage.

### Inference with approximate likelihoods

Helen Ogden (Southampton)

Helen presented some theoretical results for convergence of Laplace approximations to latent variable models and composite likelihood for the Ising model (arXiv 1601.07911). In both cases, she measures the approximation error using the distance from the score function of the true model. For the Laplace approximation, the number of observations needs to grow at a rate proportional to the number of latent variables. She also showed that the reduced dependence approximation (RDA) has polynomial computational cost when the inverse temperature β is below the critical value. Some of these ideas have been implemented in her R package **glmmsr** (available on CRAN).

### Scalable statistical inference with INLA

Håvard Rue (KAUST)

This talk was particularly interesting for the discussion of diminishing returns from sparse matrix representations as dimension increases. Integrated, nested Laplace approximations (INLA) have enjoyed great success for approximate Bayesian inference on generalised linear models (GLM) or generalised additive models (GAM), particularly with 1D (HMM) or 2D (MRF) correlation structures.

### Inference in generative models using the Wasserstein distance

Christian Robert (U. Paris Dauphine & U. Warwick)

Approximate Bayesian computation (ABC) is crucially dependent on the choice of distance function between the observations and pseudo-data. X’ian showed that the earth mover’s distance (EMD) or Wasserstein metric has some particularly useful properties in this context (arXiv 1701.05146). For some models, the Wasserstein metric can be computed directly from the parameter values, without any need for simulation of pseudo-data.

### Transferability: as easy as ABC?

Kerrie Mengersen (QUT)

Kerrie discussed the issue of geographic transferability for ecological models. The key question is how much a model trained in one specific context can be generalised to other settings, for example as an informative prior, through a hierarchical model, or in the experimental design. Difficulties arise when covariates are missing in one location, or measured in a different way. Approaches include history matching for partially-informative priors (arXiv 1605.08860) and decision-theoretic subsampling of data (*Stat. Sci.* 2017).