A new version 0.5-0 of my R package bayesImageS is now available on CRAN. To accompany it is a revision to my paper with Kerrie and Tony, “Scalable Bayesian inference for the inverse temperature of a hidden Potts model.” (Moores, Pettitt & Mengersen, arXiv:1503.08066v2). This paper introduces the parametric functional approximate Bayesian (PFAB) algorithm (the ‘p’ is silent…), which is a form of Bayesian indirect likelihood (BIL).

PFAB splits computation into 3 stages:

1. Simulation for fixed $\beta$ using Swendsen-Wang
2. Fitting a parametric surrogate model using Stan
3. Approximate posterior inference using Metropolis-within-Gibbs

## Stage 1

For Stage 1, I used 2000 iterations of SW for each of 72 values of $\beta$, but this is really overkill for most applications. I chose 72 values because I happened to have a 36-core, hyperthreaded CPU available. Here I’ll just be running everything on my laptop (an i7 CPU with 4 hyperthreaded cores), so 28 values should be plenty. The idea is to have higher density closer to the critical temperature, where the variance (and hence the gradient of the score function) is greatest.

For our precomputation step, we need to know the image dimensions $n$ and the number of labels $k$ that we will use for pixel classification. We’ll be using the Lake of Menteith dataset from Bayesian Essentials with R (Marin & Robert, 2014):

library(bayess)
data("Menteith")
iter <- 800
burn <- iter/4 + 1
n <- prod(dim(Menteith))
k <- 6
image(as.matrix(Menteith),asp=1,xaxt='n',yaxt='n',
col=gray(0:255/255))


Greyscale satellite image of Lake Menteith

The precomputation step is usually the most expensive part, but for 100×100 pixels it should only take around 15 to 20 seconds:

library(bayesImageS)
bcrit <- log(1 + sqrt(k))
beta <- sort(c(seq(0,1,by=0.1),seq(1.05,1.15,by=0.05),
bcrit-0.05,bcrit-0.02,bcrit+0.02,
seq(1.3,1.4,by=0.05),seq(1.5,2,by=0.1),2.5,3))
maxS <- nrow(edges)
E0 <- maxS/k
V0 <- maxS*(1/k)*(1 - 1/k)


Embarassingly parallel, using all available CPU cores:

cores <- min(detectCores(), length(beta))
print(paste("Parallel computation using",cores,"CPU cores:",
iter,"iterations for",length(beta),"values of beta."))
cl <- makeForkCluster(cores, outfile="")
print(cl)
clusterSetRNGStream(cl)
registerDoParallel(cl)


 [1] "Parallel computation using 4 CPU cores: 800 iterations for 28 values of beta." socket cluster with 4 nodes on host ‘localhost’ 

Simulate from the prior to verify the critical value of $\beta$:

tm <- system.time(matu <- foreach(i=1:length(beta),
.packages=c("bayesImageS"), .combine='cbind') %dopar% {
res <- swNoData(beta[i],k,neigh,block,iter)
res$sum }) print(tm) save(matu, file=paste0("n",sqrt(n),"k",k,"_counts.rda")) stopCluster(cl)   user system elapsed 0.055 0.067 16.881  This shows the piecewise linear approximation that we used in our first paper (Moores et al., STCO 2015): lrcst=approxfun(beta,colMeans(matu)) plot(beta,colMeans(matu),main="", xlab=expression(beta),ylab=expression(S(z)),asp=1) curve(lrcst,0,max(beta),add=T,col="blue") abline(v=bcrit,col="red",lty=3) abline(h=maxS,col=2,lty=2) points(0,E0,col=2,pch=2)  ## Stage 2 Instead, for Stage 2 we will use Stan to fit a parametric integral curve: functions { vector ft(vector t, real tC, real e0, real ecrit, real v0, real vmaxLo, real vmaxHi, real phi1, real phi2) { vector[num_elements(t)] mu; real sqrtBcritPhi = sqrt(tC)*phi1; for (i in 1:num_elements(t)) { if (t[i] <= tC) { real sqrtBdiffPhi = sqrt(tC - t[i])*phi1; mu[i] = e0 + t[i]*v0 - ((2*(vmaxLo-v0))/(phi1^2))*((sqrtBcritPhi + 1)/exp(sqrtBcritPhi) - (sqrtBdiffPhi + 1)/exp(sqrtBdiffPhi)); } else { real sqrtBdiff = sqrt(t[i] - tC); mu[i] = ecrit - ((2*vmaxHi)/phi2)*(sqrtBdiff/exp(phi2*sqrtBdiff) + (exp(-phi2*sqrtBdiff) - 1)/phi2); } } return mu; } vector dfdt(vector t, real tC, real v0, real vmaxLo, real vmaxHi, real r1, real r2) { vector[num_elements(t)] dmu; for (i in 1:num_elements(t)) { if (t[i] <= tC) { dmu[i] = v0 + (vmaxLo-v0)*exp(-r1*sqrt(tC - t[i])); } else { dmu[i] = vmaxHi*exp(-r2*sqrt(t[i] - tC)); } } return dmu; } } data { int<lower = 1> M; int<lower = 1> N; real<lower = 1> maxY; real<lower = 1> Vlim; real<lower = 0> e0; real<lower = 0> v0; real tcrit; matrix<lower=0, upper=maxY>[M,N] y; vector[M] t; } parameters { real<lower = 0> a; real<lower = 0> b; real<lower = e0, upper=maxY> ecrit; real<lower = 0, upper=Vlim> vmaxLo; real<lower = 0, upper=Vlim> vmaxHi; } transformed parameters { vector[M] curr_mu; vector[M] curr_var; curr_mu = ft(t, tcrit, e0, ecrit, v0, vmaxLo, vmaxHi, a, b); curr_var = dfdt(t, tcrit, v0, vmaxLo, vmaxHi, a, b); } model { for (i in 1:M) { y[i,] ~ normal(curr_mu[i], sqrt(curr_var[i])); } }  For comparison, see a previous blog post where I fitted a simple, logistic curve using Stan. library(rstan) options(mc.cores = min(4, parallel::detectCores())) dat <- list(M=length(beta), N=iter-burn+1, maxY=maxS, e0=E0, v0=V0, Vlim=2*maxS*log(maxS)/pi, tcrit=bcrit, y=t(matu[burn:iter,]), t=beta) tm2 <- system.time(fit <- sampling(PFAB, data = dat, verbose=TRUE, iter=5000, control = list(adapt_delta = 0.9, max_treedepth=20))) print(fit, pars=c("a","b","ecrit","vmaxLo","vmaxHi"), digits=3)   CHECKING DATA AND PREPROCESSING FOR MODEL 'stan-1aa3ff1f583' NOW. COMPILING MODEL ‘stan-1aa3ff1f583’ NOW. STARTING SAMPLER FOR MODEL ‘stan-1aa3ff1f583’ NOW. starting worker pid=7953 on localhost:11107 at 21:01:28.616 starting worker pid=7961 on localhost:11107 at 21:01:28.832 starting worker pid=7969 on localhost:11107 at 21:01:29.056 starting worker pid=7977 on localhost:11107 at 21:01:29.267 Gradient evaluation took 0.000253 seconds 1000 transitions using 10 leapfrog steps per transition would take 2.53 seconds. Adjust your expectations accordingly! Elapsed Time: 317.369 seconds (Warm-up) 154.02 seconds (Sampling) 471.389 seconds (Total) Roughly 8 minutes to fit the surrogate model makes this the most expensive step, but only because the first step was so fast. For much larger images (a megapixel or more), it will be the other way around – as shown in the paper. ft <- function(t, tC, e0, ecrit, v0, vmax1, vmax2, phi1, phi2) { sqrtBcritPhi = sqrt(tC)*phi1 fval <- numeric(length(t)) for (i in 1:length(t)) { if (t[i] <= tC) { sqrtBdiffPhi = sqrt(tC - t[i])*phi1 fval[i] <- e0 + t[i]*v0 - ((2*(vmax1-v0))/(phi1^2))*((sqrtBcritPhi + 1)/exp(sqrtBcritPhi) - (sqrtBdiffPhi + 1)/exp(sqrtBdiffPhi)); } else { sqrtBdiff = sqrt(t[i] - tC) fval[i] <- ecrit - ((2*vmax2)/phi2)*(sqrtBdiff/exp(phi2*sqrtBdiff) + (exp(-phi2*sqrtBdiff) - 1)/phi2); } } return(fval) } plot(range(beta),range(matu),type='n', xlab=expression(beta),ylab=expression(S(z))) idx <- burn+sample.int(iter-burn+1,size=20) abline(v=bcrit,col="red",lty=3) abline(h=maxS,col=2,lty=2) points(rep(beta,each=20),matu[idx,],pch=20) lines(beta, ft(beta, bcrit, E0, 14237, V0, 59019, 124668, 4.556, 6.691), <span data-mce-type="bookmark" id="mce_SELREST_start" data-mce-style="overflow:hidden;line-height:0" style="overflow:hidden;line-height:0" ></span>col=4, lwd=2)  Parametric Functional Approximation To really see how well this approximation fits the true model, we need to look at the residuals: residMx <- matrix(nrow=iter-burn+1, ncol=length(beta)) for (b in 1:length(beta)) { residMx[,b] <- matu[burn:iter,b] - ft(beta[b], bcrit, E0, 14237, V0, 59019, 124668, 4.556, 6.691) } dfdt <- function(t, tC, V0, Vmax1, Vmax2, r1, r2) { ifelse(t < tC, V0 + (Vmax1-V0)*exp(-r1*sqrt(tC - t)), Vmax2*exp(-r2*sqrt(t - tC))) } plot(range(beta),range(residMx),type='n',xlab=expression(beta),ylab="residuals") abline(h=0,lty=2,col=4,lwd=2) points(rep(beta,each=iter-burn+1),residMx,pch='.',cex=3) x <- sort(c(seq(0,3,by=0.01),bcrit)) lines(x, 3*sqrt(dfdt(x, bcrit, V0, 59019, 124668, 4.556, 6.691)), col=2, lwd=2) lines(x, -3*sqrt(dfdt(x, bcrit, V0, 59019, 124668, 4.556, 6.691)), col=2, lwd=2)  This shows that 28 values of $\beta$ were enough to obtain a high-quality fit between the true model and the surrogate. ## Stage 3 Now that we have our surrogate model, we can proceed to the final stage, which is to perform image segmentation using mcmcPotts: mh <- list(algorithm="aux", bandwidth=0.02, Vmax1=59019, Vmax2=124668, E0=E0, Ecrit=14237, phi1=4.556, phi2=6.691, factor=1, bcrit=bcrit, V0=V0) priors <- list() priors$k <- k
priors$mu <- c(0, 50, 100, 150, 200, 250) priors$mu.sd <- rep(10,k)
priors$sigma <- rep(20,k) priors$sigma.nu <- rep(5, k)
priors$beta <- c(0,3) iter <- 1e4 burn <- iter/2 y <- as.vector(as.matrix(Menteith)) tm3 <- system.time(resPFAB <- mcmcPotts(y,neigh,block,priors,mh,iter,burn)) print(tm3)   user system elapsed 52.332 0.638 13.666  Now we compare with the approximate exchange algorithm: mh <- list(algorithm="ex", bandwidth=0.02, auxiliary=200) tm4 <- system.time(resAEA <- mcmcPotts(y,neigh,block,priors,mh,iter,burn)) print(tm4)   user system elapsed 7429.956 689.200 3236.957  Over 200 times speedup, in comparison to AEA. There is reasonably good agreement between the posterior distributions: densPFAB <- density(resPFAB$beta[burn:iter])
densAEA <- density(resAEA$beta[burn:iter]) plot(densAEA, col=4, lty=2, lwd=2, main="", xlab=expression(beta), xlim=range(resPFAB$beta[burn:iter],resAEA\$beta[burn:iter]))
lines(densPFAB, col=2, lty=3, lwd=3)
abline(h=0,lty=2)
legend("topright",legend=c("AEA","PFAB"),col=c(4,2),lty=c(2,3),
lwd=3)


From → Imaging, R

Ella Kaye on Ella Kaye

Computational Bayesian statistics

Bayes' Food Cake

A bit of statistics, a bit of cakes.

RWeekly.org - Blogs to Learn R from the Community

Computational Bayesian statistics

Richard Everitt's blog

Computational Bayesian statistics

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

Sweet Tea, Science

Two southern scientistas will be bringing you all that is awesome in STEM as we complete our PhDs. Ecology, statistics, sass.

Musings, useful code etc. on R and data science

Another Astrostatistics Blog

The random musings of a reformed astronomer ...

Darren Wilkinson's research blog

Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics

Computational Bayesian statistics

Igor Kromin

Computational Bayesian statistics

Statisfaction

I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science