There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.

]]>

- Google Chrome
- Dropbox
- TeX Live 2018 (including TexWorks)
- Microsoft R Open 3.5.0 (including MKL)
- Rtools34
- RStudio Desktop 1.1.453
- Java SE JDK 8u171 with NetBeans 8.2 IDE (64 bit)
- JabRef (64 bit)
- Microsoft Office Home & Student 2016 (64 bit)
- Adobe Acrobat Reader DC
- PuTTY 0.7 (64 bit)
- WinSCP 5.13.3
- Git 2.18.0 for Windows (64 bit)

With all of this installed and my Dropbox synced, I have 197 GB of free space on my 446 GB solid-state drive.

]]>

This will be my farewell tour of the UK, as I’ll be relocating back to Australia after an amazing four years as a postdoc at the University of Warwick. After UseR!, I’ll be taking up a lectureship in the School of Mathematics and Statistics and the National Institute for Applied Statistics Research Australia (NIASRA) at the University of Wollongong.

ABC in Edinburgh, Sunday June 24

The inverse temperature parameter of the Potts model governs the strength of spatial cohesion and therefore has a major influence over the resulting model fit. A difficulty arises from the dependence of an intractable normalising constant on the value of this parameter and thus there is no closed-form solution for sampling from the posterior distribution directly. There are a variety of computational approaches for sampling from the posterior without evaluating the normalising constant, including the exchange algorithm and approximate Bayesian computation (ABC). A serious drawback of these algorithms is that they do not scale well for models with a large state space, such as images with a million or more pixels. We introduce a parametric surrogate model, which approximates the score function using an integral curve. Our surrogate model incorporates known properties of the likelihood, such as heteroskedasticity and critical temperature. We demonstrate this method using synthetic data as well as remotely-sensed imagery from the Landsat-8 satellite. We achieve up to a hundredfold improvement in the elapsed runtime, compared to the exchange algorithm or ABC. An open source implementation of our algorithm is available in the R package `bayesImageS’.

Moores, Pettitt & Mengersen (2015; v2 2018) “Scalable Bayesian inference for the inverse temperature of a hidden Potts model” arXiv:1503.08066 [stat.CO]

ISBA World Meeting, University of Edinburgh, Monday June 25

Raman spectroscopy can be used to identify molecules by the characteristic scattering of light from a laser. Each Raman-active dye label has a unique spectral signature, comprised by the locations and amplitudes of the peaks. The presence of a large, non-uniform background presents a major challenge to analysis of these spectra. We introduce a sequential Monte Carlo (SMC) algorithm to separate the observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian, or pseudo-Voigt functions, while the baseline is estimated using a penalised cubic spline. Our model-based approach accounts for differences in resolution and experimental conditions. We incorporate prior information to improve identifiability and regularise the solution. By utilising this representation in a Bayesian functional regression, we can quantify the relationship between molecular concentration and peak intensity, resulting in an improved estimate of the limit of detection. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. These methods have been implemented as an R package, using RcppEigen and OpenMP.

Moores, Gracie, Carson, Faulds, Graham & Girolami (2016; v2 2018) “Bayesian modelling and quantification of Raman spectroscopy” arXiv:1604.07299 [stat.AP]

]]>

Depending on your configuration, you might need to edit the following file:

/Library/Frameworks/R.framework/Resources/etc/Makeconf

and change this line:

MAIN_LDFLAGS = -fopenmp

to something like this (depending where you installed CUDA):

MAIN_LDFLAGS = -L/usr/local/cuda/lib

This fixes the following error from nvcc:

** arch - /usr/local/cuda/bin/nvcc -shared -fopenmp -L/usr/local/lib -F/Library/Frameworks/R.framework/.. -framework R -lpcre -llzma -lbz2 -lz -licucore -lm -liconv -lpcre -llzma -lbz2 -lz -licucore -lm -liconv -lcublas -lnvrtc -lcuda rinterface.o mi.o sort.o granger.o qrdecomp.o correlation.o hcluster.o distance.o matmult.o lsfit.o kendall.o cuseful.o -o gputools.so nvcc fatal : Unknown option 'fopenmp' make: *** [gputools.so] Error 1 ERROR: compilation failed for package ‘gputools’

**Note**: this is probably why the package was removed from CRAN…

You might also need to edit ~/.R/Makevars if you followed my previous instructions on how to compile parallel OpenMP code on macOS X.

There is a second line that also causes problems with nvcc:

LIBR = -F/Library/Frameworks/R.framework/.. -framework R

Thanks to this post on StackExchange, which references this post in the nVidia forum, this line should be changed to:

LIBR = -Xlinker -framework,R

Finally, remember to set the following environment variables:

export CUDA_HOME=/usr/local/cuda export DYLD_LIBRARY_PATH=/usr/local/cuda/lib/:$DYLD_LIBRARY_PATH

**Final note**: system-wide changes to Makeconf are generally a *very* bad idea. The instructions above are likely to break compilation for any other (non-CUDA) R packages. Therefore, I would recommend reverting all of these changes once **gputools** has been successfully installed. Alternatively, you might want to investigate other R packages that provide CUDA support…

More details about the model and SMC algorithm are available in my preprint on arXiv (Moores et al., 2006; v2 2018). The following gives an example of applying **serrsBayes** to surface-enhanced Raman spectroscopy (SERS) from a previous paper (Gracie et al., 2016).

This is a type of functional data analysis (Ramsay et al., 2009), since the discretised spectrum is represented using latent (unobserved), continuous functions. The background fluorescence is estimated using a penalised B-spline (Wood, 2017), while the peaks can be modelled as Gaussian, Lorentzian, or pseudo-Voigt functions.

The Voigt function is a *convolution* of a Gaussian and a Lorentzian: . It has an additional parameter that equals 0 for pure Gaussian and 1 for Lorentzian:

where is the amplitude of peak ; is the peak location; and is the broadening. The horizontal axis of a Raman spectrum is measured in wavenumbers , with units of inverse centimetres (). The vertical axis is measured in arbitrary units (a.u.), since the intensity of the Raman signal depends on the properties of the spectrometer.

We can download some SERS spectra in a zip file:

tmp <- tempfile() download.file("https://pure.strath.ac.uk/portal/files/43595106/Figure_2.zip", tmp) tmp2 <- unzip(tmp, "Figure 2/T20 SERS spectra/T20_1_ REP1 Well_A1.SPC")

trying URL 'https://pure.strath.ac.uk/portal/files/43595106/Figure_2.zip'

downloaded 270 KB

This data is in the binary SPC file format used by Grams/AI. Fortunately, we can use the R package **hyperSpec** to read this file and plot the spectrum:

library(hyperSpec) spcT20 <- read.spc (tmp2) plot(spcT20[1,], col=4, wl.range=600~1800, title.args=list(main="Raman Spectrum of TAMRA+DNA")) spectra <- spcT20[1,,600~1800] wavenumbers <- wl(spectra) nWL <- length(wavenumbers)

We will use the same priors that were described in the paper (Moores et al., 2016), including the TD-DFT peak locations from Watanabe et al. (2005):

peakLocations <- c(615, 631, 664, 673, 702, 705, 771, 819, 895, 923, 1014, 1047, 1049, 1084, 1125, 1175, 1192, 1273, 1291, 1307, 1351, 1388, 1390, 1419, 1458, 1505, 1530, 1577, 1601, 1615, 1652, 1716) nPK <- length(peakLocations) priors <- list(loc.mu=peakLocations, loc.sd=rep(50,nPK), scaG.mu=log(16.47) - (0.34^2)/2, scaG.sd=0.34, scaL.mu=log(25.27) - (0.4^2)/2, scaL.sd=0.4, noise.nu=5, noise.sd=50, bl.smooth=1, bl.knots=121)

Now we run the SMC algorithm to fit the model:

library(serrsBayes) tm <- system.time(result <- fitVoigtPeaksSMC(wavenumbers, as.matrix(spectra), priors, npart=2000)) result$time <- tm save(result, file="Figure 2/result.rda")

[1] "SMC with 1 observations at 1 unique concentrations, 2000 particles, and 2401 wavenumbers."

[1] "Step 0: computing 125 B-spline basis functions (r=10) took 0.28sec."

[1] "Mean noise parameter sigma is now 60.3304671005565"

[1] "Mean spline penalty lambda is now 1"

[1] "Step 1: initialization for 32 Voigt peaks took 24.959 sec."

[1] "Reweighting took 1.208sec. for ESS 1800.80025019536 with new kappa 0.00096893310546875."

[1] "Iteration 2 took 253.487sec. for 10 MCMC loops (acceptance rate 0.3053)"

[1] "Reweighting took 1.07499999999999sec. for ESS 1621.343255666 with new kappa 0.00144911924144253."

. . .

[1] "Iteration 239 took 250.380000000005sec. for 10 MCMC loops (acceptance rate 0.2247)"

[1] "Reweighting took 0.0559999999968568sec. for ESS 1270.7842854632 with new kappa 1."

[1] "Iteration 240 took 249.332999999999sec. for 10 MCMC loops (acceptance rate 0.2313)"

The default values for the number of particles, Markov chain steps, and learning rate can be somewhat conservative, depending on the application. Unfortunately, the new function fitVoigtPeaksSMC has not been parallelised yet, so it only runs on a single core. Thus, it can take a long time to fit the model with 34 peaks and 2401 wavenumbers:

print(paste(result$time["elapsed"]/3600,"hours for",length(result$ess),"SMC iterations."))

[1] "16.4389 hours for 240 SMC iterations."

The downside of choosing smaller values for these tuning parameters is that you run the risk of the SMC collapsing. The quality of the particle distribution deteriorates with each iteration, as measured by the effective sample size (ESS):

plot.ts(result$ess, ylab="ESS", main="Effective Sample Size", xlab="SMC iteration") abline(h=length(result$sigma)/2, col=4, lty=2) abline(h=0,lty=2)

Note: this is very bad! The variance of the importance sampling estimator is unbounded in this case. The resampling step is intended to refresh the particles, but this introduces duplicates into the population. The Metropolis-Hastings (M-H) steps move some of the particles, but the bandwidths of the random walk proposals are chosen adaptively, based on the particle distribution. If this degenerates too far, then the M-H acceptance rate will also fall to zero:

If SMC collapses, the best solution is to increase the number of particles and run it again. Thus, choosing a conservative number to begin with is a sensible strategy. With 2000 particles and 10 M-H steps per SMC iteration, the algorithm converges to the target distribution:

A subsample of particles can be used to plot the posterior distribution of the baseline and peaks:

samp.idx <- sample.int(length(result$weights), 50, prob=result$weights) samp.mat <- resid.mat <- matrix(0,nrow=length(samp.idx), ncol=nWL) samp.sigi <- samp.lambda <- numeric(length=nrow(samp.mat)) spectra <- as.matrix(spectra) plot(wavenumbers, spectra[1,], type='l', xlab="Raman offset", ylab="intensity") for (pt in 1:length(samp.idx)) { k <- samp.idx[pt] samp.mat[pt,] <- mixedVoigt(result$location[k,], result$scale_G[k,], result$scale_L[k,], result$beta[k,], wavenumbers) samp.sigi[pt] <- result$sigma[k] samp.lambda[pt] <- result$lambda[k] Obsi <- spectra[1,] - samp.mat[pt,] g0_Cal <- length(Obsi) * samp.lambda[pt] * result$priors$bl.precision gi_Cal <- crossprod(result$priors$bl.basis) + g0_Cal mi_Cal <- as.vector(solve(gi_Cal, crossprod(result$priors$bl.basis, Obsi))) bl.est <- result$priors$bl.basis %*% mi_Cal # smoothed residuals = estimated basline lines(wavenumbers, bl.est, col="#C3000020") lines(wavenumbers, bl.est + samp.mat[pt,], col="#0000C30F") resid.mat[pt,] <- Obsi - bl.est[,1] } title(main="Baseline for TAMRA")

Notice that the uncertainty in the baseline is greatest where the peaks are bunched close together, which is exactly what we would expect. This is also reflected in uncertainty of the spectral signature:

plot(range(wavenumbers), range(samp.mat), type='n', xlab="Raman offset", ylab="Intensity") abline(h=0,lty=2) for (pt in 1:length(samp.idx)) { lines(wavenumbers, samp.mat[pt,], col="#0000C330") lines(wavenumbers, resid.mat[pt,] + samp.mat[pt,], col="#00000020") } title(main="Spectral Signature")

Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. 2006. “Sequential Monte Carlo Samplers.” *J. R. Stat. Soc. Ser. B* 68 (3): 411–36. doi:10.1111/j.1467-9868.2006.00553.x.

Gracie, K., M. Moores, W. E. Smith, Kerry Harding, M. Girolami, D. Graham, and K. Faulds. 2016. “Preferential Attachment of Specific Fluorescent Dyes and Dye Labelled DNA Sequences in a SERS Multiplex.” *Anal. Chem.* 88 (2): 1147–53. doi:10.1021/acs.analchem.5b02776.

Jacob, Pierre E., Lawrence M. Murray, and Sylvain Rubenthaler. 2015. “Path Storage in the Particle Filter.” *Stat. Comput.* 25 (2): 487–96. doi:10.1007/s11222-013-9445-x.

Lee, Anthony, and Nick Whiteley. 2015. “Variance Estimation in the Particle Filter.” *arXiv Preprint arXiv:1509.00394 [Stat.CO]*. https://arxiv.org/abs/1509.00394.

Moores, M., K. Gracie, J. Carson, K. Faulds, D. Graham, and M. Girolami. 2016. “Bayesian Modelling and Quantification of Raman Spectroscopy.” *arXiv Preprint arXiv:1604.07299 [Stat.AP]*. http://arxiv.org/abs/1604.07299.

Ramsay, Jim O., Giles Hooker, and Spencer Graves. 2009. *Functional Data Analysis with R and MATLAB*. Use R! New York: Springer. doi:10.1007/978-0-387-98185-7.

Watanabe, Hiroyuki, Norihiko Hayazawa, Yasushi Inouye, and Satoshi Kawata. 2005. “DFT Vibrational Calculations of Rhodamine 6g Adsorbed on Silver: Analysis of Tip-Enhanced Raman Spectroscopy.” *J. Phys. Chem. B* 109 (11): 5012–20. doi:10.1021/jp045771u.

Wood, Simon N. 2017. *Generalized Additive Models: An Introduction with R*. 2nd ed. Boca Raton, FL, USA: Chapman & Hall/CRC Press. https://people.maths.bris.ac.uk/~sw15190/igam/index.html.

]]>

If you want to destroy my sweater

Hold this thread as I walk away

*Undone — Weezer*

I received an unexpected email about the new version 0.5-0 of bayesImageS:

Dear maintainer,

Please see the problems shown on

<https://cran.r-project.org/web/checks/check_results_bayesImageS.html>.Please correct before 2018-02-11 to safely retain your package on CRAN.

*(so unexpected, it was actually filtered into my junk email folder…)*

Version: 0.5-0 Check: Rd cross-references Result: WARN Unknown package ‘PottsUtils’ in Rd xrefs

Unknown? I could have sworn that package was available on CRAN the last time I checked!

Package ‘PottsUtils’ was removed from the CRAN repository.

Formerly available versions can be obtained from the archive.

Archived on 2018-01-27 as it depends on the archived ‘miscF’.

Oh, dear…

Package ‘miscF’ was removed from the CRAN repository.

Formerly available versions can be obtained from the archive.

Archived on 2018-01-27 as it depends on the archived ‘BayesBridge’, and on the non-portable ‘BRugs’.

It’s a massacre!

Package ‘BayesBridge’ was removed from the CRAN repository.

Formerly available versions can be obtained from the archive.

Archived on 2018-01-27 as no corrections were received despite reminders.

The hastily-constructed v0.5-1 of **bayesImageS** removes all Rd cross-references to **PottsUtils** and **mritc**, which corrects all of the NOTEs and WARNings from CRAN. The original version 0.2-1 of **PottsUtils** was released on CRAN in April 2011, 2 months after I started my PhD. I’d like to thank the package author, Dai Feng, and his PhD supervisor, Prof. Luke Tierney, for releasing their software as open source and for maintaining it for the past 7 years. It was a big reason why I chose R rather than Python for my work in image analysis. *In memoriam.*

As a warning to other package authors, be careful what dependencies you choose to include. Also, make sure emails from @R-project.org aren’t filtered as spam. At best, you’re only ever 2 weeks from having your R package permanently removed from the CRAN repository!

]]>

PFAB splits computation into 3 stages:

- Simulation for fixed using Swendsen-Wang
- Fitting a parametric surrogate model using Stan
- Approximate posterior inference using Metropolis-within-Gibbs

For **Stage 1**, I used 2000 iterations of SW for each of 72 values of , but this is really overkill for most applications. I chose 72 values because I happened to have a 36-core, hyperthreaded CPU available. Here I’ll just be running everything on my laptop (an *i*7 CPU with 4 hyperthreaded cores), so 28 values should be plenty. The idea is to have higher density closer to the critical temperature, where the variance (and hence the gradient of the score function) is greatest.

For our precomputation step, we need to know the image dimensions and the number of labels that we will use for pixel classification. We’ll be using the Lake of Menteith dataset from Bayesian Essentials with R (Marin & Robert, 2014):

library(bayess) data("Menteith") iter <- 800 burn <- iter/4 + 1 n <- prod(dim(Menteith)) k <- 6 image(as.matrix(Menteith),asp=1,xaxt='n',yaxt='n', col=gray(0:255/255))

The precomputation step is usually the most expensive part, but for 100×100 pixels it should only take around 15 to 20 seconds:

library(bayesImageS) bcrit <- log(1 + sqrt(k)) beta <- sort(c(seq(0,1,by=0.1),seq(1.05,1.15,by=0.05), bcrit-0.05,bcrit-0.02,bcrit+0.02, seq(1.3,1.4,by=0.05),seq(1.5,2,by=0.1),2.5,3)) mask <- matrix(1, nrow=sqrt(n), ncol=sqrt(n)) neigh <- getNeighbors(mask, c(2,2,0,0)) block <- getBlocks(mask, 2) edges <- getEdges(mask, c(2,2,0,0)) maxS <- nrow(edges) E0 <- maxS/k V0 <- maxS*(1/k)*(1 - 1/k)

Embarassingly parallel, using all available CPU cores:

cores <- min(detectCores(), length(beta)) print(paste("Parallel computation using",cores,"CPU cores:", iter,"iterations for",length(beta),"values of beta.")) cl <- makeForkCluster(cores, outfile="") print(cl) clusterSetRNGStream(cl) registerDoParallel(cl)

[1] "Parallel computation using 4 CPU cores: 800 iterations for 28 values of beta."

socket cluster with 4 nodes on host ‘localhost’

Simulate from the prior to verify the critical value of :

tm <- system.time(matu <- foreach(i=1:length(beta), .packages=c("bayesImageS"), .combine='cbind') %dopar% { res <- swNoData(beta[i],k,neigh,block,iter) res$sum }) print(tm) save(matu, file=paste0("n",sqrt(n),"k",k,"_counts.rda")) stopCluster(cl)

user system elapsed

0.055 0.067 16.881

This shows the piecewise linear approximation that we used in our first paper (Moores et al., STCO 2015):

lrcst=approxfun(beta,colMeans(matu)) plot(beta,colMeans(matu),main="", xlab=expression(beta),ylab=expression(S(z)),asp=1) curve(lrcst,0,max(beta),add=T,col="blue") abline(v=bcrit,col="red",lty=3) abline(h=maxS,col=2,lty=2) points(0,E0,col=2,pch=2)

Instead, for **Stage 2** we will use Stan to fit a parametric integral curve:

functions { vector ft(vector t, real tC, real e0, real ecrit, real v0, real vmaxLo, real vmaxHi, real phi1, real phi2) { vector[num_elements(t)] mu; real sqrtBcritPhi = sqrt(tC)*phi1; for (i in 1:num_elements(t)) { if (t[i] <= tC) { real sqrtBdiffPhi = sqrt(tC - t[i])*phi1; mu[i] = e0 + t[i]*v0 - ((2*(vmaxLo-v0))/(phi1^2))*((sqrtBcritPhi + 1)/exp(sqrtBcritPhi) - (sqrtBdiffPhi + 1)/exp(sqrtBdiffPhi)); } else { real sqrtBdiff = sqrt(t[i] - tC); mu[i] = ecrit - ((2*vmaxHi)/phi2)*(sqrtBdiff/exp(phi2*sqrtBdiff) + (exp(-phi2*sqrtBdiff) - 1)/phi2); } } return mu; } vector dfdt(vector t, real tC, real v0, real vmaxLo, real vmaxHi, real r1, real r2) { vector[num_elements(t)] dmu; for (i in 1:num_elements(t)) { if (t[i] <= tC) { dmu[i] = v0 + (vmaxLo-v0)*exp(-r1*sqrt(tC - t[i])); } else { dmu[i] = vmaxHi*exp(-r2*sqrt(t[i] - tC)); } } return dmu; } } data { int<lower = 1> M; int<lower = 1> N; real<lower = 1> maxY; real<lower = 1> Vlim; real<lower = 0> e0; real<lower = 0> v0; real tcrit; matrix<lower=0, upper=maxY>[M,N] y; vector[M] t; } parameters { real<lower = 0> a; real<lower = 0> b; real<lower = e0, upper=maxY> ecrit; real<lower = 0, upper=Vlim> vmaxLo; real<lower = 0, upper=Vlim> vmaxHi; } transformed parameters { vector[M] curr_mu; vector[M] curr_var; curr_mu = ft(t, tcrit, e0, ecrit, v0, vmaxLo, vmaxHi, a, b); curr_var = dfdt(t, tcrit, v0, vmaxLo, vmaxHi, a, b); } model { for (i in 1:M) { y[i,] ~ normal(curr_mu[i], sqrt(curr_var[i])); } }

For comparison, see a previous blog post where I fitted a simple, logistic curve using Stan.

library(rstan) options(mc.cores = min(4, parallel::detectCores())) dat <- list(M=length(beta), N=iter-burn+1, maxY=maxS, e0=E0, v0=V0, Vlim=2*maxS*log(maxS)/pi, tcrit=bcrit, y=t(matu[burn:iter,]), t=beta) tm2 <- system.time(fit <- sampling(PFAB, data = dat, verbose=TRUE, iter=5000, control = list(adapt_delta = 0.9, max_treedepth=20))) print(fit, pars=c("a","b","ecrit","vmaxLo","vmaxHi"), digits=3)

CHECKING DATA AND PREPROCESSING FOR MODEL 'stan-1aa3ff1f583' NOW.

COMPILING MODEL ‘stan-1aa3ff1f583’ NOW.

STARTING SAMPLER FOR MODEL ‘stan-1aa3ff1f583’ NOW.

starting worker pid=7953 on localhost:11107 at 21:01:28.616

starting worker pid=7961 on localhost:11107 at 21:01:28.832

starting worker pid=7969 on localhost:11107 at 21:01:29.056

starting worker pid=7977 on localhost:11107 at 21:01:29.267

Gradient evaluation took 0.000253 seconds

1000 transitions using 10 leapfrog steps per transition would take 2.53 seconds.

Adjust your expectations accordingly!

Elapsed Time: 317.369 seconds (Warm-up)

154.02 seconds (Sampling)

471.389 seconds (Total)

Roughly 8 minutes to fit the surrogate model makes this the most expensive step, but only because the first step was so fast. For much larger images (a megapixel or more), it will be the other way around – as shown in the paper.

ft <- function(t, tC, e0, ecrit, v0, vmax1, vmax2, phi1, phi2) { sqrtBcritPhi = sqrt(tC)*phi1 fval <- numeric(length(t)) for (i in 1:length(t)) { if (t[i] <= tC) { sqrtBdiffPhi = sqrt(tC - t[i])*phi1 fval[i] <- e0 + t[i]*v0 - ((2*(vmax1-v0))/(phi1^2))*((sqrtBcritPhi + 1)/exp(sqrtBcritPhi) - (sqrtBdiffPhi + 1)/exp(sqrtBdiffPhi)); } else { sqrtBdiff = sqrt(t[i] - tC) fval[i] <- ecrit - ((2*vmax2)/phi2)*(sqrtBdiff/exp(phi2*sqrtBdiff) + (exp(-phi2*sqrtBdiff) - 1)/phi2); } } return(fval) } plot(range(beta),range(matu),type='n', xlab=expression(beta),ylab=expression(S(z))) idx <- burn+sample.int(iter-burn+1,size=20) abline(v=bcrit,col="red",lty=3) abline(h=maxS,col=2,lty=2) points(rep(beta,each=20),matu[idx,],pch=20) lines(beta, ft(beta, bcrit, E0, 14237, V0, 59019, 124668, 4.556, 6.691), <span data-mce-type="bookmark" id="mce_SELREST_start" data-mce-style="overflow:hidden;line-height:0" style="overflow:hidden;line-height:0" ></span>col=4, lwd=2)

To really see how well this approximation fits the true model, we need to look at the residuals:

residMx <- matrix(nrow=iter-burn+1, ncol=length(beta)) for (b in 1:length(beta)) { residMx[,b] <- matu[burn:iter,b] - ft(beta[b], bcrit, E0, 14237, V0, 59019, 124668, 4.556, 6.691) } dfdt <- function(t, tC, V0, Vmax1, Vmax2, r1, r2) { ifelse(t < tC, V0 + (Vmax1-V0)*exp(-r1*sqrt(tC - t)), Vmax2*exp(-r2*sqrt(t - tC))) } plot(range(beta),range(residMx),type='n',xlab=expression(beta),ylab="residuals") abline(h=0,lty=2,col=4,lwd=2) points(rep(beta,each=iter-burn+1),residMx,pch='.',cex=3) x <- sort(c(seq(0,3,by=0.01),bcrit)) lines(x, 3*sqrt(dfdt(x, bcrit, V0, 59019, 124668, 4.556, 6.691)), col=2, lwd=2) lines(x, -3*sqrt(dfdt(x, bcrit, V0, 59019, 124668, 4.556, 6.691)), col=2, lwd=2)

This shows that 28 values of were enough to obtain a high-quality fit between the true model and the surrogate.

Now that we have our surrogate model, we can proceed to the final stage, which is to perform image segmentation using mcmcPotts:

mh <- list(algorithm="aux", bandwidth=0.02, Vmax1=59019, Vmax2=124668, E0=E0, Ecrit=14237, phi1=4.556, phi2=6.691, factor=1, bcrit=bcrit, V0=V0) priors <- list() priors$k <- k priors$mu <- c(0, 50, 100, 150, 200, 250) priors$mu.sd <- rep(10,k) priors$sigma <- rep(20,k) priors$sigma.nu <- rep(5, k) priors$beta <- c(0,3) iter <- 1e4 burn <- iter/2 y <- as.vector(as.matrix(Menteith)) tm3 <- system.time(resPFAB <- mcmcPotts(y,neigh,block,priors,mh,iter,burn)) print(tm3)

user system elapsed

52.332 0.638 13.666

Now we compare with the approximate exchange algorithm:

mh <- list(algorithm="ex", bandwidth=0.02, auxiliary=200) tm4 <- system.time(resAEA <- mcmcPotts(y,neigh,block,priors,mh,iter,burn)) print(tm4)

user system elapsed

7429.956 689.200 3236.957

Over 200 times speedup, in comparison to AEA. There is reasonably good agreement between the posterior distributions:

densPFAB <- density(resPFAB$beta[burn:iter]) densAEA <- density(resAEA$beta[burn:iter]) plot(densAEA, col=4, lty=2, lwd=2, main="", xlab=expression(beta), xlim=range(resPFAB$beta[burn:iter],resAEA$beta[burn:iter])) lines(densPFAB, col=2, lty=3, lwd=3) abline(h=0,lty=2) legend("topright",legend=c("AEA","PFAB"),col=c(4,2),lty=c(2,3), lwd=3)]]>

`mcmcPotts`

in my R package, `mcmcPottsNoData`

and `swNoData`

functions.

The most accurate way to measure convergence is using the coupling time of a perfect sampling algorithm, such as coupling from the past (CFTP). However, we can obtain a rough estimate by monitoring the distribution of the sufficient statistic:

Where δ(x,y) is the Kronecker delta function. Note that this sum is defined over the *unique* undirected edges of the lattice, to avoid double-counting. Under this definition, the critical temperature of the q-state Potts model is , or ≈0.88 for the Ising model with q=2 unique labels. Some papers state that the critical temperature of the Ising model is 0.44, but this is because they have used a different definition of S(z).

We will generate synthetic data for a sequence of values of the inverse temperature, β=(0.22,0.44,0.88,1.32,1.76,2.20):

library(bayesImageS) library(doParallel) set.seed(123) q <- 2 beta <- c(0.22, 0.44, 0.88, 1.32, 1.76, 2.20) mask <- matrix(1,nrow=500,ncol=500) n <- prod(dim(mask)) neigh <- getNeighbors(mask, c(2,2,0,0)) block <- getBlocks(mask, 2) edges <- getEdges(mask, c(2,2,0,0)) maxS <- nrow(edges) cl <- makeCluster(min(4, detectCores())) registerDoParallel(cl) system.time(synth <- foreach (i=1:length(beta), .packages="bayesImageS") %dopar% { { gen <- list() gen$beta <- beta[i] # generate labels sw <- swNoData(beta[i], q, neigh, block, 200) gen$z <- sw$z gen$sum <- sw$sum[200] # now add noise gen$mu <- rnorm(2, c(-1,1), 0.5) gen$sd <- 1/sqrt(rgamma(2, 1.5, 2)) gen$y <- rnorm(n, gen$mu[(gen$z[1:n,1])+1], gen$sd[(gen$z[1:n,1])+1]) gen }) stopCluster(cl)

## user system elapsed ## 0.307 0.065 20.271

Now let’s look at the distribution of Gibbs samples for the first dataset, using a fixed value of β:

priors <- list() priors$k <- q priors$mu <- c(-1,1) priors$mu.sd <- rep(0.5,q) priors$sigma <- rep(2,q) priors$sigma.nu <- rep(1.5,q) priors$beta <- rep(synth[[1]]$beta, 2) mh <- list(algorithm="ex", bandwidth=1, adaptive=NA, auxiliary=1) tm <- system.time(res <- mcmcPotts(synth[[1]]$y, neigh, block, priors, mh, 100, 50)) print(tm) ts.plot(res$sum, xlab="MCMC iterations", ylab=expression(S(z))) abline(h=synth[[1]]$sum, col=4, lty=2)

## user system elapsed ## 29.186 2.506 9.335

As expected for β=0.22 with n= 500×500 pixels, convergence takes only a dozen iterations or so. The same is true for β=0.66:

priors$beta <- rep(synth[[2]]$beta, 2) tm2 <- system.time(res2 <- mcmcPotts(synth[[2]]$y, neigh, block, priors, mh, 100, 50)) print(tm2) ts.plot(res2$sum, xlab="MCMC iterations", ylab=expression(S(z))) abline(h=synth[[2]]$sum, col=4, lty=2)

## user system elapsed ## 25.194 3.393 11.495

Now with β=0.88, just below the critical temperature:

priors$beta <- rep(synth[[3]]$beta, 2) tm3 <- system.time(res3 <- mcmcPotts(synth[[3]]$y, neigh, block, priors, mh, 100, 50)) print(tm3) ts.plot(res3$sum, xlab="MCMC iterations", ylab=expression(S(z))) abline(h=synth[[3]]$sum, col=4, lty=2)

## user system elapsed ## 26.658 3.361 11.444

So far, so good. Now let’s try with β=1.32:

priors$beta <- rep(synth[[4]]$beta, 2) tm4 <- system.time(res4 <- mcmcPotts(synth[[4]]$y, neigh, block, priors, mh, 300, 150)) print(tm4) ts.plot(res4$sum, xlab="MCMC iterations", ylab=expression(S(z))) abline(h=synth[[4]]$sum, col=4, lty=2)

## user system elapsed ## 88.414 9.170 30.481

This doesn’t really count as slow mixing, since the Gibbs sampler has converged within 300 iterations for a lattice with 500×500 pixels. Compare how long it takes without the external field:

system.time(res5 <- mcmcPottsNoData(synth[[4]]$beta, q, neigh, block, 20000))

## user system elapsed ## 1036.752 46.607 317.952

This explains why single-site Gibbs sampling should **never** be used for the auxiliary iterations in ABC or the exchange algorithm, but it is usually fine to use when updating the hidden labels. The Gaussian likelihood of the observed pixels, which is referred to in statistical mechanics as an “external field,” is assisting the model to converge to the correct stationary distribution. Without this additional information to give it a “nudge,” the Gibbs sampler is more likely to become stuck in a local mode. Note that all of these results have been for a fixed β. It is more difficult to assess convergence when β is unknown. A topic for a future post!

`mono_cftp_Ising`

function below implements monotonic CFTP for the Ising model (equivalent to the Potts model with only `q=2`

states). This algorithm returns a single, unbiased sample from the Ising model for a given inverse temperature, β. When combined with the exchange algorithm (Murray, Ghahramani & MacKay, 2006), this enables exact posterior inference for β. However, problems can occur when the value of β is too large, since the underlying single-site Gibbs sampler can fail to converge.

Previously, I’ve compared the Gibbs sampler with Swendsen-Wang for the Potts model, as implemented in my R package bayesImageS. I showed that the Gibbs sampler exhibits torpid mixing when β is larger than the critical value, . This slowdown can be quantified using CFTP, since it provides an accurate estimate of how many iterations an MCMC algorithm takes to converge.

Below the critical point, runtime is less than a second for 25,200 iterations of random scan, single-site Gibbs updates (T=5 recursions). The coalescence time roughly doubles with every increase in β (log-linear). At β=0.88, the average runtime is around 6 seconds for between 100k and 800k iterations. This trend accelerates beyond the critical point, requiring almost an hour for up to 838 ** million** iterations to converge. This is for an image with only n=400 pixels. This torpid mixing can play havoc with the exchange algorithm, as you can imagine. For example, see the numerical results reported by McGrory, Titterington, Reeves & Pettitt in their 2009 paper.

CFTP might not be all that useful in practice, but it forms the basis for more advanced algorithms such as the perfect slice sampler of Mira, Møller & Roberts (2001) or the bounding chain for Swendsen-Wang (Huber, 2003). My code for the perfect slice sampler in the gist below appears to have a bug, but `mono_cftp_Ising`

should be working fine.

Matti Vihola (Jyväskylä)

Matti presented an importance sampling (IS) correction for MCMC chains that do not target the exact posterior distribution (arXiv 1609.02541). This has advantages over other “exact-approximate” methods, such as delayed acceptance (DA), because the IS correction can be performed offline, in parallel. This has great potential to be combined with Bayesian indirect likelihood (BIL) methods, such as my surrogate model for ABC or Gaussian process approximation for pseudo-marginal MCMC.

Murray Pollock (Warwick)

Murray presented some algorithms for combining inference from multiple, distributed sub-posteriors. The essential idea is to run a coupled Markov chain for each sub-posterior. When the chains coalesce, this gives an unbiased sample from the combined posterior distribution. These parallel methods are related to the SCALE algorithm (arXiv 1609.03436).

Tamara Broderick (MIT)

Tamara’s talk combined ideas from Comp. Sci. for dimension reduction with statistical algorithms for Bayesian inference. A *coreset* is a weighted subsample of the data, which is intended to provide a low-dimensional representation while minimising information loss. This can provide superior results over naïve subsampling methods, such as stochastic gradient descent (SGD) or stochastic gradient Langevin dynamics (SGLD). More details are available in her NIPS 2016 paper and the article homepage.

Helen Ogden (Southampton)

Helen presented some theoretical results for convergence of Laplace approximations to latent variable models and composite likelihood for the Ising model (arXiv 1601.07911). In both cases, she measures the approximation error using the distance from the score function of the true model. For the Laplace approximation, the number of observations needs to grow at a rate proportional to the number of latent variables. She also showed that the reduced dependence approximation (RDA) has polynomial computational cost when the inverse temperature β is below the critical value. Some of these ideas have been implemented in her R package **glmmsr** (available on CRAN).

Håvard Rue (KAUST)

This talk was particularly interesting for the discussion of diminishing returns from sparse matrix representations as dimension increases. Integrated, nested Laplace approximations (INLA) have enjoyed great success for approximate Bayesian inference on generalised linear models (GLM) or generalised additive models (GAM), particularly with 1D (HMM) or 2D (MRF) correlation structures.

Christian Robert (U. Paris Dauphine & U. Warwick)

Approximate Bayesian computation (ABC) is crucially dependent on the choice of distance function between the observations and pseudo-data. X’ian showed that the earth mover’s distance (EMD) or Wasserstein metric has some particularly useful properties in this context (arXiv 1701.05146). For some models, the Wasserstein metric can be computed directly from the parameter values, without any need for simulation of pseudo-data.

Kerrie Mengersen (QUT)

Kerrie discussed the issue of geographic transferability for ecological models. The key question is how much a model trained in one specific context can be generalised to other settings, for example as an informative prior, through a hierarchical model, or in the experimental design. Difficulties arise when covariates are missing in one location, or measured in a different way. Approaches include history matching for partially-informative priors (arXiv 1605.08860) and decision-theoretic subsampling of data (*Stat. Sci.* 2017).