Skip to content

Parametric Functional Approximate Bayesian (PFAB) algorithm

January 27, 2018

A new version 0.5-0 of my R package bayesImageS is now available on CRAN. To accompany it is a revision to my paper with Kerrie and Tony, “Scalable Bayesian inference for the inverse temperature of a hidden Potts model.” (Moores, Pettitt & Mengersen, arXiv:1503.08066v2). This paper introduces the parametric functional approximate Bayesian (PFAB) algorithm (the ‘p’ is silent…), which is a form of Bayesian indirect likelihood (BIL).

PFAB splits computation into 3 stages:

  1. Simulation for fixed \beta using Swendsen-Wang
  2. Fitting a parametric surrogate model using Stan
  3. Approximate posterior inference using Metropolis-within-Gibbs

Stage 1

For Stage 1, I used 2000 iterations of SW for each of 72 values of \beta, but this is really overkill for most applications. I chose 72 values because I happened to have a 36-core, hyperthreaded CPU available. Here I’ll just be running everything on my laptop (an i7 CPU with 4 hyperthreaded cores), so 28 values should be plenty. The idea is to have higher density closer to the critical temperature, where the variance (and hence the gradient of the score function) is greatest.

For our precomputation step, we need to know the image dimensions n and the number of labels k that we will use for pixel classification. We’ll be using the Lake of Menteith dataset from Bayesian Essentials with R (Marin & Robert, 2014):

iter <- 800
burn <- iter/4 + 1
n <- prod(dim(Menteith))
k <- 6

Greyscale satellite image of Lake Menteith

The precomputation step is usually the most expensive part, but for 100×100 pixels it should only take around 15 to 20 seconds:

bcrit <- log(1 + sqrt(k))
beta <- sort(c(seq(0,1,by=0.1),seq(1.05,1.15,by=0.05),
mask <- matrix(1, nrow=sqrt(n), ncol=sqrt(n))
neigh <- getNeighbors(mask, c(2,2,0,0))
block <- getBlocks(mask, 2)
edges <- getEdges(mask, c(2,2,0,0))
maxS <- nrow(edges)
E0 <- maxS/k
V0 <- maxS*(1/k)*(1 - 1/k)

Embarassingly parallel, using all available CPU cores:

cores <- min(detectCores(), length(beta))
print(paste("Parallel computation using",cores,"CPU cores:",
            iter,"iterations for",length(beta),"values of beta."))
cl <- makeForkCluster(cores, outfile="")

[1] "Parallel computation using 4 CPU cores: 800 iterations for 28 values of beta."
socket cluster with 4 nodes on host ‘localhost’

Simulate from the prior to verify the critical value of \beta:

tm <- system.time(matu <- foreach(i=1:length(beta),
      .packages=c("bayesImageS"), .combine='cbind') %dopar% {
  res <- swNoData(beta[i],k,neigh,block,iter)
save(matu, file=paste0("n",sqrt(n),"k",k,"_counts.rda"))

user system elapsed
0.055 0.067 16.881

This shows the piecewise linear approximation that we used in our first paper (Moores et al., STCO 2015):



Stage 2

Instead, for Stage 2 we will use Stan to fit a parametric integral curve:

functions {
  vector ft(vector t, real tC, real e0, real ecrit, real v0, real vmaxLo, real vmaxHi, real phi1, real phi2) {
    vector[num_elements(t)] mu;
    real sqrtBcritPhi = sqrt(tC)*phi1;
    for (i in 1:num_elements(t)) {
      if (t[i] <= tC) {
        real sqrtBdiffPhi = sqrt(tC - t[i])*phi1;
        mu[i] = e0 + t[i]*v0 - ((2*(vmaxLo-v0))/(phi1^2))*((sqrtBcritPhi + 1)/exp(sqrtBcritPhi) - (sqrtBdiffPhi + 1)/exp(sqrtBdiffPhi));
      } else {
        real sqrtBdiff = sqrt(t[i] - tC);
        mu[i] = ecrit - ((2*vmaxHi)/phi2)*(sqrtBdiff/exp(phi2*sqrtBdiff) + (exp(-phi2*sqrtBdiff) - 1)/phi2);
    return mu;

  vector dfdt(vector t, real tC, real v0, real vmaxLo, real vmaxHi, real r1, real r2) {
    vector[num_elements(t)] dmu;
    for (i in 1:num_elements(t)) {
      if (t[i] <= tC) {
        dmu[i] = v0 + (vmaxLo-v0)*exp(-r1*sqrt(tC - t[i]));
      } else {
        dmu[i] = vmaxHi*exp(-r2*sqrt(t[i] - tC));
    return dmu;
data {
  int<lower = 1> M;
  int<lower = 1> N;
  real<lower = 1> maxY;
  real<lower = 1> Vlim;
  real<lower = 0> e0;
  real<lower = 0> v0;
  real tcrit;
  matrix<lower=0, upper=maxY>[M,N] y;
  vector[M] t;
parameters {
  real<lower = 0> a;
  real<lower = 0> b;
  real<lower = e0, upper=maxY> ecrit;
  real<lower = 0, upper=Vlim> vmaxLo;
  real<lower = 0, upper=Vlim> vmaxHi;
transformed parameters {
  vector[M] curr_mu;
  vector[M] curr_var;
  curr_mu = ft(t, tcrit, e0, ecrit, v0, vmaxLo, vmaxHi, a, b);
  curr_var = dfdt(t, tcrit, v0, vmaxLo, vmaxHi, a, b);
model {
  for (i in 1:M) {
    y[i,] ~ normal(curr_mu[i], sqrt(curr_var[i]));

For comparison, see a previous blog post where I fitted a simple, logistic curve using Stan.

options(mc.cores = min(4, parallel::detectCores()))
dat <- list(M=length(beta), N=iter-burn+1, maxY=maxS, e0=E0, v0=V0,
            Vlim=2*maxS*log(maxS)/pi, tcrit=bcrit, y=t(matu[burn:iter,]), t=beta)
tm2 <- system.time(fit <- sampling(PFAB, data = dat, verbose=TRUE, iter=5000,
                            control = list(adapt_delta = 0.9, max_treedepth=20)))
print(fit, pars=c("a","b","ecrit","vmaxLo","vmaxHi"), digits=3)


COMPILING MODEL ‘stan-1aa3ff1f583’ NOW.

starting worker pid=7953 on localhost:11107 at 21:01:28.616
starting worker pid=7961 on localhost:11107 at 21:01:28.832
starting worker pid=7969 on localhost:11107 at 21:01:29.056
starting worker pid=7977 on localhost:11107 at 21:01:29.267

Gradient evaluation took 0.000253 seconds
1000 transitions using 10 leapfrog steps per transition would take 2.53 seconds.
Adjust your expectations accordingly!

Elapsed Time: 317.369 seconds (Warm-up)
154.02 seconds (Sampling)
471.389 seconds (Total)

Roughly 8 minutes to fit the surrogate model makes this the most expensive step, but only because the first step was so fast. For much larger images (a megapixel or more), it will be the other way around – as shown in the paper.

ft <- function(t, tC, e0, ecrit, v0, vmax1, vmax2, phi1, phi2) {
  sqrtBcritPhi = sqrt(tC)*phi1
  fval <- numeric(length(t))
  for (i in 1:length(t)) {
    if (t[i] <= tC) {
      sqrtBdiffPhi = sqrt(tC - t[i])*phi1
      fval[i] <- e0 + t[i]*v0 - ((2*(vmax1-v0))/(phi1^2))*((sqrtBcritPhi + 1)/exp(sqrtBcritPhi) - (sqrtBdiffPhi + 1)/exp(sqrtBdiffPhi));
    } else {
      sqrtBdiff = sqrt(t[i] - tC)
      fval[i] <- ecrit - ((2*vmax2)/phi2)*(sqrtBdiff/exp(phi2*sqrtBdiff) + (exp(-phi2*sqrtBdiff) - 1)/phi2);

idx <-,size=20)
lines(beta, ft(beta, bcrit, E0, 14237, V0,
               59019, 124668, 4.556, 6.691), <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>col=4, lwd=2)

Parametric Functional Approximation

To really see how well this approximation fits the true model, we need to look at the residuals:

residMx <- matrix(nrow=iter-burn+1, ncol=length(beta))
for (b in 1:length(beta)) {
  residMx[,b] <- matu[burn:iter,b] - ft(beta[b], bcrit, E0, 14237, V0, 59019, 124668, 4.556, 6.691)

dfdt <- function(t, tC, V0, Vmax1, Vmax2, r1, r2) {
  ifelse(t < tC,
         V0 + (Vmax1-V0)*exp(-r1*sqrt(tC - t)),
         Vmax2*exp(-r2*sqrt(t - tC)))


x <- sort(c(seq(0,3,by=0.01),bcrit))
lines(x, 3*sqrt(dfdt(x, bcrit, V0, 59019, 124668, 4.556, 6.691)), col=2, lwd=2)
lines(x, -3*sqrt(dfdt(x, bcrit, V0, 59019, 124668, 4.556, 6.691)), col=2, lwd=2)


This shows that 28 values of \beta were enough to obtain a high-quality fit between the true model and the surrogate.

Stage 3

Now that we have our surrogate model, we can proceed to the final stage, which is to perform image segmentation using mcmcPotts:

mh <- list(algorithm="aux", bandwidth=0.02, Vmax1=59019,
           Vmax2=124668, E0=E0, Ecrit=14237, phi1=4.556,
           phi2=6.691, factor=1, bcrit=bcrit, V0=V0)
priors <- list()
priors$k <- k
priors$mu <- c(0, 50, 100, 150, 200, 250)
priors$ <- rep(10,k)
priors$sigma <- rep(20,k)
priors$ <- rep(5, k)
priors$beta <- c(0,3)
iter <- 1e4
burn <- iter/2
y <- as.vector(as.matrix(Menteith))
tm3 <- system.time(resPFAB <-

user system elapsed
52.332 0.638 13.666

Now we compare with the approximate exchange algorithm:

mh <- list(algorithm="ex", bandwidth=0.02, auxiliary=200)
tm4 <- system.time(resAEA <- 

user system elapsed
7429.956 689.200 3236.957

Over 200 times speedup, in comparison to AEA. There is reasonably good agreement between the posterior distributions:

densPFAB <- density(resPFAB$beta[burn:iter])
densAEA <- density(resAEA$beta[burn:iter])
plot(densAEA, col=4, lty=2, lwd=2, main="", xlab=expression(beta),
lines(densPFAB, col=2, lty=3, lwd=3)



From → Imaging, R

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Ella Kaye on Ella Kaye

Computational Bayesian statistics

Bayes' Food Cake

A bit of statistics, a bit of cakes. - Blogs to Learn R from the Community

Computational Bayesian statistics

Richard Everitt's blog

Computational Bayesian statistics

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

Mad (Data) Scientist

Musings, useful code etc. on R and data science

Another Astrostatistics Blog

The random musings of a reformed astronomer ...

Darren Wilkinson's research blog

Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics

(badness 10000)

Computational Bayesian statistics

Igor Kromin

Computational Bayesian statistics


I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science

%d bloggers like this: