Skip to content

Scalable inference; statistical, algorithmic, computational aspects

October 6, 2017

In July I attended a month-long programme at the Isaac Newton Institute, Cambridge, organised by the i-like project: “Scalable inference; statistical, algorithmic, computational aspects.” Videos of some selected talks are now available online, so I thought I would highlight some that in my opinion are particularly worth watching.

IS type estimators based on approximate marginal MCMC

Matti Vihola (Jyväskylä)

Matti presented an importance sampling (IS) correction for MCMC chains that do not target the exact posterior distribution (arXiv 1609.02541). This has advantages over other “exact-approximate” methods, such as delayed acceptance (DA), because the IS correction can be performed offline, in parallel. This has great potential to be combined with Bayesian indirect likelihood (BIL) methods, such as my surrogate model for ABC or Gaussian process approximation for pseudo-marginal MCMC.

Exact Bayesian Inference for Big Data: Single- and Multi-Core Approaches

Murray Pollock (Warwick)

Murray presented some algorithms for combining inference from multiple, distributed sub-posteriors. The essential idea is to run a coupled Markov chain for each sub-posterior. When the chains coalesce, this gives an unbiased sample from the combined posterior distribution. These parallel methods are related to the SCALE algorithm (arXiv 1609.03436).

Coresets for scalable Bayesian logistic regression

Tamara Broderick (MIT)

Tamara’s talk combined ideas from Comp. Sci. for dimension reduction with statistical algorithms for Bayesian inference. A coreset is a weighted subsample of the data, which is intended to provide a low-dimensional representation while minimising information loss. This can provide superior results over naïve subsampling methods, such as stochastic gradient descent (SGD) or stochastic gradient Langevin dynamics (SGLD). More details are available in her NIPS 2016 paper and the article homepage.

Inference with approximate likelihoods

Helen Ogden (Southampton)

Helen presented some theoretical results for convergence of Laplace approximations to latent variable models and composite likelihood for the Ising model (arXiv 1601.07911). In both cases, she measures the approximation error using the distance from the score function of the true model. For the Laplace approximation, the number of observations needs to grow at a rate proportional to the number of latent variables. She also showed that the reduced dependence approximation (RDA) has polynomial computational cost when the inverse temperature β is below the critical value. Some of these ideas have been implemented in her R package glmmsr (available on CRAN).

Scalable statistical inference with INLA

Håvard Rue (KAUST)

This talk was particularly interesting for the discussion of diminishing returns from sparse matrix representations as dimension increases. Integrated, nested Laplace approximations (INLA) have enjoyed great success for approximate Bayesian inference on generalised linear models (GLM) or generalised additive models (GAM), particularly with 1D (HMM) or 2D (MRF) correlation structures.

Inference in generative models using the Wasserstein distance

Christian Robert (U. Paris Dauphine & U. Warwick)

Approximate Bayesian computation (ABC) is crucially dependent on the choice of distance function between the observations and pseudo-data. X’ian showed that the earth mover’s distance (EMD) or Wasserstein metric has some particularly useful properties in this context (arXiv 1701.05146). For some models, the Wasserstein metric can be computed directly from the parameter values, without any need for simulation of pseudo-data.

Transferability: as easy as ABC?

Kerrie Mengersen (QUT)

Kerrie discussed the issue of geographic transferability for ecological models. The key question is how much a model trained in one specific context can be generalised to other settings, for example as an informative prior, through a hierarchical model, or in the experimental design. Difficulties arise when covariates are missing in one location, or measured in a different way. Approaches include history matching for partially-informative priors (arXiv 1605.08860) and decision-theoretic subsampling of data (Stat. Sci. 2017).


From → MCMC

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Ella Kaye on Ella Kaye

Computational Bayesian statistics

Bayes' Food Cake

A bit of statistics, a bit of cakes. - Blogs to Learn R from the Community

Computational Bayesian statistics

Richard Everitt's blog

Computational Bayesian statistics

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

Mad (Data) Scientist

Musings, useful code etc. on R and data science

Another Astrostatistics Blog

The random musings of a reformed astronomer ...

Darren Wilkinson's research blog

Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics

(badness 10000)

Computational Bayesian statistics

Igor Kromin

Computational Bayesian statistics


I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science

%d bloggers like this: