Depending on your configuration, you might need to edit the following file:

/Library/Frameworks/R.framework/Resources/etc/Makeconf

and change this line:

MAIN_LDFLAGS = -fopenmp

to something like this (depending where you installed CUDA):

MAIN_LDFLAGS = -L/usr/local/cuda/lib

This fixes the following error from nvcc:

** arch - /usr/local/cuda/bin/nvcc -shared -fopenmp -L/usr/local/lib -F/Library/Frameworks/R.framework/.. -framework R -lpcre -llzma -lbz2 -lz -licucore -lm -liconv -lpcre -llzma -lbz2 -lz -licucore -lm -liconv -lcublas -lnvrtc -lcuda rinterface.o mi.o sort.o granger.o qrdecomp.o correlation.o hcluster.o distance.o matmult.o lsfit.o kendall.o cuseful.o -o gputools.so nvcc fatal : Unknown option 'fopenmp' make: *** [gputools.so] Error 1 ERROR: compilation failed for package ‘gputools’

**Note**: this is probably why the package was removed from CRAN…

You might also need to edit ~/.R/Makevars if you followed my previous instructions on how to compile parallel OpenMP code on macOS X.

There is a second line that also causes problems with nvcc:

LIBR = -F/Library/Frameworks/R.framework/.. -framework R

Thanks to this post on StackExchange, which references this post in the nVidia forum, this line should be changed to:

LIBR = -Xlinker -framework,R

Finally, remember to set the following environment variables:

export CUDA_HOME=/usr/local/cuda export DYLD_LIBRARY_PATH=/usr/local/cuda/lib/:$DYLD_LIBRARY_PATH

**Final note**: system-wide changes to Makeconf are generally a *very* bad idea. The instructions above are likely to break compilation for any other (non-CUDA) R packages. Therefore, I would recommend reverting all of these changes once **gputools** has been successfully installed. Alternatively, you might want to investigate other R packages that provide CUDA support…

More details about the model and SMC algorithm are available in my preprint on arXiv (Moores et al., 2006; v2 2018). The following gives an example of applying **serrsBayes** to surface-enhanced Raman spectroscopy (SERS) from a previous paper (Gracie et al., 2016).

This is a type of functional data analysis (Ramsay et al., 2009), since the discretised spectrum is represented using latent (unobserved), continuous functions. The background fluorescence is estimated using a penalised B-spline (Wood, 2017), while the peaks can be modelled as Gaussian, Lorentzian, or pseudo-Voigt functions.

The Voigt function is a *convolution* of a Gaussian and a Lorentzian: . It has an additional parameter that equals 0 for pure Gaussian and 1 for Lorentzian:

where is the amplitude of peak ; is the peak location; and is the broadening. The horizontal axis of a Raman spectrum is measured in wavenumbers , with units of inverse centimetres (). The vertical axis is measured in arbitrary units (a.u.), since the intensity of the Raman signal depends on the properties of the spectrometer.

We can download some SERS spectra in a zip file:

tmp <- tempfile() download.file("https://pure.strath.ac.uk/portal/files/43595106/Figure_2.zip", tmp) tmp2 <- unzip(tmp, "Figure 2/T20 SERS spectra/T20_1_ REP1 Well_A1.SPC")

trying URL 'https://pure.strath.ac.uk/portal/files/43595106/Figure_2.zip'

downloaded 270 KB

This data is in the binary SPC file format used by Grams/AI. Fortunately, we can use the R package **hyperSpec** to read this file and plot the spectrum:

library(hyperSpec) spcT20 <- read.spc (tmp2) plot(spcT20[1,], col=4, wl.range=600~1800, title.args=list(main="Raman Spectrum of TAMRA+DNA")) spectra <- spcT20[1,,600~1800] wavenumbers <- wl(spectra) nWL <- length(wavenumbers)

We will use the same priors that were described in the paper (Moores et al., 2016), including the TD-DFT peak locations from Watanabe et al. (2005):

peakLocations <- c(615, 631, 664, 673, 702, 705, 771, 819, 895, 923, 1014, 1047, 1049, 1084, 1125, 1175, 1192, 1273, 1291, 1307, 1351, 1388, 1390, 1419, 1458, 1505, 1530, 1577, 1601, 1615, 1652, 1716) nPK <- length(peakLocations) priors <- list(loc.mu=peakLocations, loc.sd=rep(50,nPK), scaG.mu=log(16.47) - (0.34^2)/2, scaG.sd=0.34, scaL.mu=log(25.27) - (0.4^2)/2, scaL.sd=0.4, noise.nu=5, noise.sd=50, bl.smooth=1, bl.knots=121)

Now we run the SMC algorithm to fit the model:

library(serrsBayes) tm <- system.time(result <- fitVoigtPeaksSMC(wavenumbers, as.matrix(spectra), priors, npart=2000)) result$time <- tm save(result, file="Figure 2/result.rda")

[1] "SMC with 1 observations at 1 unique concentrations, 2000 particles, and 2401 wavenumbers."

[1] "Step 0: computing 125 B-spline basis functions (r=10) took 0.28sec."

[1] "Mean noise parameter sigma is now 60.3304671005565"

[1] "Mean spline penalty lambda is now 1"

[1] "Step 1: initialization for 32 Voigt peaks took 24.959 sec."

[1] "Reweighting took 1.208sec. for ESS 1800.80025019536 with new kappa 0.00096893310546875."

[1] "Iteration 2 took 253.487sec. for 10 MCMC loops (acceptance rate 0.3053)"

[1] "Reweighting took 1.07499999999999sec. for ESS 1621.343255666 with new kappa 0.00144911924144253."

. . .

[1] "Iteration 239 took 250.380000000005sec. for 10 MCMC loops (acceptance rate 0.2247)"

[1] "Reweighting took 0.0559999999968568sec. for ESS 1270.7842854632 with new kappa 1."

[1] "Iteration 240 took 249.332999999999sec. for 10 MCMC loops (acceptance rate 0.2313)"

The default values for the number of particles, Markov chain steps, and learning rate can be somewhat conservative, depending on the application. Unfortunately, the new function fitVoigtPeaksSMC has not been parallelised yet, so it only runs on a single core. Thus, it can take a long time to fit the model with 34 peaks and 2401 wavenumbers:

print(paste(result$time["elapsed"]/3600,"hours for",length(result$ess),"SMC iterations."))

[1] "16.4389 hours for 240 SMC iterations."

The downside of choosing smaller values for these tuning parameters is that you run the risk of the SMC collapsing. The quality of the particle distribution deteriorates with each iteration, as measured by the effective sample size (ESS):

plot.ts(result$ess, ylab="ESS", main="Effective Sample Size", xlab="SMC iteration") abline(h=length(result$sigma)/2, col=4, lty=2) abline(h=0,lty=2)

Note: this is very bad! The variance of the importance sampling estimator is unbounded in this case. The resampling step is intended to refresh the particles, but this introduces duplicates into the population. The Metropolis-Hastings (M-H) steps move some of the particles, but the bandwidths of the random walk proposals are chosen adaptively, based on the particle distribution. If this degenerates too far, then the M-H acceptance rate will also fall to zero:

If SMC collapses, the best solution is to increase the number of particles and run it again. Thus, choosing a conservative number to begin with is a sensible strategy. With 2000 particles and 10 M-H steps per SMC iteration, the algorithm converges to the target distribution:

A subsample of particles can be used to plot the posterior distribution of the baseline and peaks:

samp.idx <- sample.int(length(result$weights), 50, prob=result$weights) samp.mat <- resid.mat <- matrix(0,nrow=length(samp.idx), ncol=nWL) samp.sigi <- samp.lambda <- numeric(length=nrow(samp.mat)) spectra <- as.matrix(spectra) plot(wavenumbers, spectra[1,], type='l', xlab="Raman offset", ylab="intensity") for (pt in 1:length(samp.idx)) { k <- samp.idx[pt] samp.mat[pt,] <- mixedVoigt(result$location[k,], result$scale_G[k,], result$scale_L[k,], result$beta[k,], wavenumbers) samp.sigi[pt] <- result$sigma[k] samp.lambda[pt] <- result$lambda[k] Obsi <- spectra[1,] - samp.mat[pt,] g0_Cal <- length(Obsi) * samp.lambda[pt] * result$priors$bl.precision gi_Cal <- crossprod(result$priors$bl.basis) + g0_Cal mi_Cal <- as.vector(solve(gi_Cal, crossprod(result$priors$bl.basis, Obsi))) bl.est <- result$priors$bl.basis %*% mi_Cal # smoothed residuals = estimated basline lines(wavenumbers, bl.est, col="#C3000020") lines(wavenumbers, bl.est + samp.mat[pt,], col="#0000C30F") resid.mat[pt,] <- Obsi - bl.est[,1] } title(main="Baseline for TAMRA")

Notice that the uncertainty in the baseline is greatest where the peaks are bunched close together, which is exactly what we would expect. This is also reflected in uncertainty of the spectral signature:

plot(range(wavenumbers), range(samp.mat), type='n', xlab="Raman offset", ylab="Intensity") abline(h=0,lty=2) for (pt in 1:length(samp.idx)) { lines(wavenumbers, samp.mat[pt,], col="#0000C330") lines(wavenumbers, resid.mat[pt,] + samp.mat[pt,], col="#00000020") } title(main="Spectral Signature")

Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. 2006. “Sequential Monte Carlo Samplers.” *J. R. Stat. Soc. Ser. B* 68 (3): 411–36. doi:10.1111/j.1467-9868.2006.00553.x.

Gracie, K., M. Moores, W. E. Smith, Kerry Harding, M. Girolami, D. Graham, and K. Faulds. 2016. “Preferential Attachment of Specific Fluorescent Dyes and Dye Labelled DNA Sequences in a SERS Multiplex.” *Anal. Chem.* 88 (2): 1147–53. doi:10.1021/acs.analchem.5b02776.

Jacob, Pierre E., Lawrence M. Murray, and Sylvain Rubenthaler. 2015. “Path Storage in the Particle Filter.” *Stat. Comput.* 25 (2): 487–96. doi:10.1007/s11222-013-9445-x.

Lee, Anthony, and Nick Whiteley. 2015. “Variance Estimation in the Particle Filter.” *arXiv Preprint arXiv:1509.00394 [Stat.CO]*. https://arxiv.org/abs/1509.00394.

Moores, M., K. Gracie, J. Carson, K. Faulds, D. Graham, and M. Girolami. 2016. “Bayesian Modelling and Quantification of Raman Spectroscopy.” *arXiv Preprint arXiv:1604.07299 [Stat.AP]*. http://arxiv.org/abs/1604.07299.

Ramsay, Jim O., Giles Hooker, and Spencer Graves. 2009. *Functional Data Analysis with R and MATLAB*. Use R! New York: Springer. doi:10.1007/978-0-387-98185-7.

Watanabe, Hiroyuki, Norihiko Hayazawa, Yasushi Inouye, and Satoshi Kawata. 2005. “DFT Vibrational Calculations of Rhodamine 6g Adsorbed on Silver: Analysis of Tip-Enhanced Raman Spectroscopy.” *J. Phys. Chem. B* 109 (11): 5012–20. doi:10.1021/jp045771u.

Wood, Simon N. 2017. *Generalized Additive Models: An Introduction with R*. 2nd ed. Boca Raton, FL, USA: Chapman & Hall/CRC Press. https://people.maths.bris.ac.uk/~sw15190/igam/index.html.

]]>

If you want to destroy my sweater

Hold this thread as I walk away

*Undone — Weezer*

I received an unexpected email about the new version 0.5-0 of bayesImageS:

Dear maintainer,

Please see the problems shown on

<https://cran.r-project.org/web/checks/check_results_bayesImageS.html>.Please correct before 2018-02-11 to safely retain your package on CRAN.

*(so unexpected, it was actually filtered into my junk email folder…)*

Version: 0.5-0 Check: Rd cross-references Result: WARN Unknown package ‘PottsUtils’ in Rd xrefs

Unknown? I could have sworn that package was available on CRAN the last time I checked!

Package ‘PottsUtils’ was removed from the CRAN repository.

Formerly available versions can be obtained from the archive.

Archived on 2018-01-27 as it depends on the archived ‘miscF’.

Oh, dear…

Package ‘miscF’ was removed from the CRAN repository.

Formerly available versions can be obtained from the archive.

Archived on 2018-01-27 as it depends on the archived ‘BayesBridge’, and on the non-portable ‘BRugs’.

It’s a massacre!

Package ‘BayesBridge’ was removed from the CRAN repository.

Formerly available versions can be obtained from the archive.

Archived on 2018-01-27 as no corrections were received despite reminders.

The hastily-constructed v0.5-1 of **bayesImageS** removes all Rd cross-references to **PottsUtils** and **mritc**, which corrects all of the NOTEs and WARNings from CRAN. The original version 0.2-1 of **PottsUtils** was released on CRAN in April 2011, 2 months after I started my PhD. I’d like to thank the package author, Dai Feng, and his PhD supervisor, Prof. Luke Tierney, for releasing their software as open source and for maintaining it for the past 7 years. It was a big reason why I chose R rather than Python for my work in image analysis. *In memoriam.*

As a warning to other package authors, be careful what dependencies you choose to include. Also, make sure emails from @R-project.org aren’t filtered as spam. At best, you’re only ever 2 weeks from having your R package permanently removed from the CRAN repository!

]]>

PFAB splits computation into 3 stages:

- Simulation for fixed using Swendsen-Wang
- Fitting a parametric surrogate model using Stan
- Approximate posterior inference using Metropolis-within-Gibbs

For **Stage 1**, I used 2000 iterations of SW for each of 72 values of , but this is really overkill for most applications. I chose 72 values because I happened to have a 36-core, hyperthreaded CPU available. Here I’ll just be running everything on my laptop (an *i*7 CPU with 4 hyperthreaded cores), so 28 values should be plenty. The idea is to have higher density closer to the critical temperature, where the variance (and hence the gradient of the score function) is greatest.

For our precomputation step, we need to know the image dimensions and the number of labels that we will use for pixel classification. We’ll be using the Lake of Menteith dataset from Bayesian Essentials with R (Marin & Robert, 2014):

library(bayess) data("Menteith") iter <- 800 burn <- iter/4 + 1 n <- prod(dim(Menteith)) k <- 6 image(as.matrix(Menteith),asp=1,xaxt='n',yaxt='n', col=gray(0:255/255))

The precomputation step is usually the most expensive part, but for 100×100 pixels it should only take around 15 to 20 seconds:

library(bayesImageS) bcrit <- log(1 + sqrt(k)) beta <- sort(c(seq(0,1,by=0.1),seq(1.05,1.15,by=0.05), bcrit-0.05,bcrit-0.02,bcrit+0.02, seq(1.3,1.4,by=0.05),seq(1.5,2,by=0.1),2.5,3)) mask <- matrix(1, nrow=sqrt(n), ncol=sqrt(n)) neigh <- getNeighbors(mask, c(2,2,0,0)) block <- getBlocks(mask, 2) edges <- getEdges(mask, c(2,2,0,0)) maxS <- nrow(edges) E0 <- maxS/k V0 <- maxS*(1/k)*(1 - 1/k)

Embarassingly parallel, using all available CPU cores:

cores <- min(detectCores(), length(beta)) print(paste("Parallel computation using",cores,"CPU cores:", iter,"iterations for",length(beta),"values of beta.")) cl <- makeForkCluster(cores, outfile="") print(cl) clusterSetRNGStream(cl) registerDoParallel(cl)

[1] "Parallel computation using 4 CPU cores: 800 iterations for 28 values of beta."

socket cluster with 4 nodes on host ‘localhost’

Simulate from the prior to verify the critical value of :

tm <- system.time(matu <- foreach(i=1:length(beta), .packages=c("bayesImageS"), .combine='cbind') %dopar% { res <- swNoData(beta[i],k,neigh,block,iter) res$sum }) print(tm) save(matu, file=paste0("n",sqrt(n),"k",k,"_counts.rda")) stopCluster(cl)

user system elapsed

0.055 0.067 16.881

This shows the piecewise linear approximation that we used in our first paper (Moores et al., STCO 2015):

lrcst=approxfun(beta,colMeans(matu)) plot(beta,colMeans(matu),main="", xlab=expression(beta),ylab=expression(S(z)),asp=1) curve(lrcst,0,max(beta),add=T,col="blue") abline(v=bcrit,col="red",lty=3) abline(h=maxS,col=2,lty=2) points(0,E0,col=2,pch=2)

Instead, for **Stage 2** we will use Stan to fit a parametric integral curve:

functions { vector ft(vector t, real tC, real e0, real ecrit, real v0, real vmaxLo, real vmaxHi, real phi1, real phi2) { vector[num_elements(t)] mu; real sqrtBcritPhi = sqrt(tC)*phi1; for (i in 1:num_elements(t)) { if (t[i] <= tC) { real sqrtBdiffPhi = sqrt(tC - t[i])*phi1; mu[i] = e0 + t[i]*v0 - ((2*(vmaxLo-v0))/(phi1^2))*((sqrtBcritPhi + 1)/exp(sqrtBcritPhi) - (sqrtBdiffPhi + 1)/exp(sqrtBdiffPhi)); } else { real sqrtBdiff = sqrt(t[i] - tC); mu[i] = ecrit - ((2*vmaxHi)/phi2)*(sqrtBdiff/exp(phi2*sqrtBdiff) + (exp(-phi2*sqrtBdiff) - 1)/phi2); } } return mu; } vector dfdt(vector t, real tC, real v0, real vmaxLo, real vmaxHi, real r1, real r2) { vector[num_elements(t)] dmu; for (i in 1:num_elements(t)) { if (t[i] <= tC) { dmu[i] = v0 + (vmaxLo-v0)*exp(-r1*sqrt(tC - t[i])); } else { dmu[i] = vmaxHi*exp(-r2*sqrt(t[i] - tC)); } } return dmu; } } data { int<lower = 1> M; int<lower = 1> N; real<lower = 1> maxY; real<lower = 1> Vlim; real<lower = 0> e0; real<lower = 0> v0; real tcrit; matrix<lower=0, upper=maxY>[M,N] y; vector[M] t; } parameters { real<lower = 0> a; real<lower = 0> b; real<lower = e0, upper=maxY> ecrit; real<lower = 0, upper=Vlim> vmaxLo; real<lower = 0, upper=Vlim> vmaxHi; } transformed parameters { vector[M] curr_mu; vector[M] curr_var; curr_mu = ft(t, tcrit, e0, ecrit, v0, vmaxLo, vmaxHi, a, b); curr_var = dfdt(t, tcrit, v0, vmaxLo, vmaxHi, a, b); } model { for (i in 1:M) { y[i,] ~ normal(curr_mu[i], sqrt(curr_var[i])); } }

For comparison, see a previous blog post where I fitted a simple, logistic curve using Stan.

library(rstan) options(mc.cores = min(4, parallel::detectCores())) dat <- list(M=length(beta), N=iter-burn+1, maxY=maxS, e0=E0, v0=V0, Vlim=2*maxS*log(maxS)/pi, tcrit=bcrit, y=t(matu[burn:iter,]), t=beta) tm2 <- system.time(fit <- sampling(PFAB, data = dat, verbose=TRUE, iter=5000, control = list(adapt_delta = 0.9, max_treedepth=20))) print(fit, pars=c("a","b","ecrit","vmaxLo","vmaxHi"), digits=3)

CHECKING DATA AND PREPROCESSING FOR MODEL 'stan-1aa3ff1f583' NOW.

COMPILING MODEL ‘stan-1aa3ff1f583’ NOW.

STARTING SAMPLER FOR MODEL ‘stan-1aa3ff1f583’ NOW.

starting worker pid=7953 on localhost:11107 at 21:01:28.616

starting worker pid=7961 on localhost:11107 at 21:01:28.832

starting worker pid=7969 on localhost:11107 at 21:01:29.056

starting worker pid=7977 on localhost:11107 at 21:01:29.267

Gradient evaluation took 0.000253 seconds

1000 transitions using 10 leapfrog steps per transition would take 2.53 seconds.

Adjust your expectations accordingly!

Elapsed Time: 317.369 seconds (Warm-up)

154.02 seconds (Sampling)

471.389 seconds (Total)

Roughly 8 minutes to fit the surrogate model makes this the most expensive step, but only because the first step was so fast. For much larger images (a megapixel or more), it will be the other way around – as shown in the paper.

ft <- function(t, tC, e0, ecrit, v0, vmax1, vmax2, phi1, phi2) { sqrtBcritPhi = sqrt(tC)*phi1 fval <- numeric(length(t)) for (i in 1:length(t)) { if (t[i] <= tC) { sqrtBdiffPhi = sqrt(tC - t[i])*phi1 fval[i] <- e0 + t[i]*v0 - ((2*(vmax1-v0))/(phi1^2))*((sqrtBcritPhi + 1)/exp(sqrtBcritPhi) - (sqrtBdiffPhi + 1)/exp(sqrtBdiffPhi)); } else { sqrtBdiff = sqrt(t[i] - tC) fval[i] <- ecrit - ((2*vmax2)/phi2)*(sqrtBdiff/exp(phi2*sqrtBdiff) + (exp(-phi2*sqrtBdiff) - 1)/phi2); } } return(fval) } plot(range(beta),range(matu),type='n', xlab=expression(beta),ylab=expression(S(z))) idx <- burn+sample.int(iter-burn+1,size=20) abline(v=bcrit,col="red",lty=3) abline(h=maxS,col=2,lty=2) points(rep(beta,each=20),matu[idx,],pch=20) lines(beta, ft(beta, bcrit, E0, 14237, V0, 59019, 124668, 4.556, 6.691), <span data-mce-type="bookmark" id="mce_SELREST_start" data-mce-style="overflow:hidden;line-height:0" style="overflow:hidden;line-height:0" ></span>col=4, lwd=2)

To really see how well this approximation fits the true model, we need to look at the residuals:

residMx <- matrix(nrow=iter-burn+1, ncol=length(beta)) for (b in 1:length(beta)) { residMx[,b] <- matu[burn:iter,b] - ft(beta[b], bcrit, E0, 14237, V0, 59019, 124668, 4.556, 6.691) } dfdt <- function(t, tC, V0, Vmax1, Vmax2, r1, r2) { ifelse(t < tC, V0 + (Vmax1-V0)*exp(-r1*sqrt(tC - t)), Vmax2*exp(-r2*sqrt(t - tC))) } plot(range(beta),range(residMx),type='n',xlab=expression(beta),ylab="residuals") abline(h=0,lty=2,col=4,lwd=2) points(rep(beta,each=iter-burn+1),residMx,pch='.',cex=3) x <- sort(c(seq(0,3,by=0.01),bcrit)) lines(x, 3*sqrt(dfdt(x, bcrit, V0, 59019, 124668, 4.556, 6.691)), col=2, lwd=2) lines(x, -3*sqrt(dfdt(x, bcrit, V0, 59019, 124668, 4.556, 6.691)), col=2, lwd=2)

This shows that 28 values of were enough to obtain a high-quality fit between the true model and the surrogate.

Now that we have our surrogate model, we can proceed to the final stage, which is to perform image segmentation using mcmcPotts:

mh <- list(algorithm="aux", bandwidth=0.02, Vmax1=59019, Vmax2=124668, E0=E0, Ecrit=14237, phi1=4.556, phi2=6.691, factor=1, bcrit=bcrit, V0=V0) priors <- list() priors$k <- k priors$mu <- c(0, 50, 100, 150, 200, 250) priors$mu.sd <- rep(10,k) priors$sigma <- rep(20,k) priors$sigma.nu <- rep(5, k) priors$beta <- c(0,3) iter <- 1e4 burn <- iter/2 y <- as.vector(as.matrix(Menteith)) tm3 <- system.time(resPFAB <- mcmcPotts(y,neigh,block,priors,mh,iter,burn)) print(tm3)

user system elapsed

52.332 0.638 13.666

Now we compare with the approximate exchange algorithm:

mh <- list(algorithm="ex", bandwidth=0.02, auxiliary=200) tm4 <- system.time(resAEA <- mcmcPotts(y,neigh,block,priors,mh,iter,burn)) print(tm4)

user system elapsed

7429.956 689.200 3236.957

Over 200 times speedup, in comparison to AEA. There is reasonably good agreement between the posterior distributions:

densPFAB <- density(resPFAB$beta[burn:iter]) densAEA <- density(resAEA$beta[burn:iter]) plot(densAEA, col=4, lty=2, lwd=2, main="", xlab=expression(beta), xlim=range(resPFAB$beta[burn:iter],resAEA$beta[burn:iter])) lines(densPFAB, col=2, lty=3, lwd=3) abline(h=0,lty=2) legend("topright",legend=c("AEA","PFAB"),col=c(4,2),lty=c(2,3), lwd=3)]]>

`mcmcPotts`

in my R package, `mcmcPottsNoData`

and `swNoData`

functions.

The most accurate way to measure convergence is using the coupling time of a perfect sampling algorithm, such as coupling from the past (CFTP). However, we can obtain a rough estimate by monitoring the distribution of the sufficient statistic:

Where δ(x,y) is the Kronecker delta function. Note that this sum is defined over the *unique* undirected edges of the lattice, to avoid double-counting. Under this definition, the critical temperature of the q-state Potts model is , or ≈0.88 for the Ising model with q=2 unique labels. Some papers state that the critical temperature of the Ising model is 0.44, but this is because they have used a different definition of S(z).

We will generate synthetic data for a sequence of values of the inverse temperature, β=(0.22,0.44,0.88,1.32,1.76,2.20):

library(bayesImageS) library(doParallel) set.seed(123) q <- 2 beta <- c(0.22, 0.44, 0.88, 1.32, 1.76, 2.20) mask <- matrix(1,nrow=500,ncol=500) n <- prod(dim(mask)) neigh <- getNeighbors(mask, c(2,2,0,0)) block <- getBlocks(mask, 2) edges <- getEdges(mask, c(2,2,0,0)) maxS <- nrow(edges) cl <- makeCluster(min(4, detectCores())) registerDoParallel(cl) system.time(synth <- foreach (i=1:length(beta), .packages="bayesImageS") %dopar% { { gen <- list() gen$beta <- beta[i] # generate labels sw <- swNoData(beta[i], q, neigh, block, 200) gen$z <- sw$z gen$sum <- sw$sum[200] # now add noise gen$mu <- rnorm(2, c(-1,1), 0.5) gen$sd <- 1/sqrt(rgamma(2, 1.5, 2)) gen$y <- rnorm(n, gen$mu[(gen$z[1:n,1])+1], gen$sd[(gen$z[1:n,1])+1]) gen }) stopCluster(cl)

## user system elapsed ## 0.307 0.065 20.271

Now let’s look at the distribution of Gibbs samples for the first dataset, using a fixed value of β:

priors <- list() priors$k <- q priors$mu <- c(-1,1) priors$mu.sd <- rep(0.5,q) priors$sigma <- rep(2,q) priors$sigma.nu <- rep(1.5,q) priors$beta <- rep(synth[[1]]$beta, 2) mh <- list(algorithm="ex", bandwidth=1, adaptive=NA, auxiliary=1) tm <- system.time(res <- mcmcPotts(synth[[1]]$y, neigh, block, priors, mh, 100, 50)) print(tm) ts.plot(res$sum, xlab="MCMC iterations", ylab=expression(S(z))) abline(h=synth[[1]]$sum, col=4, lty=2)

## user system elapsed ## 29.186 2.506 9.335

As expected for β=0.22 with n= 500×500 pixels, convergence takes only a dozen iterations or so. The same is true for β=0.66:

priors$beta <- rep(synth[[2]]$beta, 2) tm2 <- system.time(res2 <- mcmcPotts(synth[[2]]$y, neigh, block, priors, mh, 100, 50)) print(tm2) ts.plot(res2$sum, xlab="MCMC iterations", ylab=expression(S(z))) abline(h=synth[[2]]$sum, col=4, lty=2)

## user system elapsed ## 25.194 3.393 11.495

Now with β=0.88, just below the critical temperature:

priors$beta <- rep(synth[[3]]$beta, 2) tm3 <- system.time(res3 <- mcmcPotts(synth[[3]]$y, neigh, block, priors, mh, 100, 50)) print(tm3) ts.plot(res3$sum, xlab="MCMC iterations", ylab=expression(S(z))) abline(h=synth[[3]]$sum, col=4, lty=2)

## user system elapsed ## 26.658 3.361 11.444

So far, so good. Now let’s try with β=1.32:

priors$beta <- rep(synth[[4]]$beta, 2) tm4 <- system.time(res4 <- mcmcPotts(synth[[4]]$y, neigh, block, priors, mh, 300, 150)) print(tm4) ts.plot(res4$sum, xlab="MCMC iterations", ylab=expression(S(z))) abline(h=synth[[4]]$sum, col=4, lty=2)

## user system elapsed ## 88.414 9.170 30.481

This doesn’t really count as slow mixing, since the Gibbs sampler has converged within 300 iterations for a lattice with 500×500 pixels. Compare how long it takes without the external field:

system.time(res5 <- mcmcPottsNoData(synth[[4]]$beta, q, neigh, block, 20000))

## user system elapsed ## 1036.752 46.607 317.952

This explains why single-site Gibbs sampling should **never** be used for the auxiliary iterations in ABC or the exchange algorithm, but it is usually fine to use when updating the hidden labels. The Gaussian likelihood of the observed pixels, which is referred to in statistical mechanics as an “external field,” is assisting the model to converge to the correct stationary distribution. Without this additional information to give it a “nudge,” the Gibbs sampler is more likely to become stuck in a local mode. Note that all of these results have been for a fixed β. It is more difficult to assess convergence when β is unknown. A topic for a future post!

`mono_cftp_Ising`

function below implements monotonic CFTP for the Ising model (equivalent to the Potts model with only `q=2`

states). This algorithm returns a single, unbiased sample from the Ising model for a given inverse temperature, β. When combined with the exchange algorithm (Murray, Ghahramani & MacKay, 2006), this enables exact posterior inference for β. However, problems can occur when the value of β is too large, since the underlying single-site Gibbs sampler can fail to converge.

Previously, I’ve compared the Gibbs sampler with Swendsen-Wang for the Potts model, as implemented in my R package bayesImageS. I showed that the Gibbs sampler exhibits torpid mixing when β is larger than the critical value, . This slowdown can be quantified using CFTP, since it provides an accurate estimate of how many iterations an MCMC algorithm takes to converge.

Below the critical point, runtime is less than a second for 25,200 iterations of random scan, single-site Gibbs updates (T=5 recursions). The coalescence time roughly doubles with every increase in β (log-linear). At β=0.88, the average runtime is around 6 seconds for between 100k and 800k iterations. This trend accelerates beyond the critical point, requiring almost an hour for up to 838 ** million** iterations to converge. This is for an image with only n=400 pixels. This torpid mixing can play havoc with the exchange algorithm, as you can imagine. For example, see the numerical results reported by McGrory, Titterington, Reeves & Pettitt in their 2009 paper.

CFTP might not be all that useful in practice, but it forms the basis for more advanced algorithms such as the perfect slice sampler of Mira, Møller & Roberts (2001) or the bounding chain for Swendsen-Wang (Huber, 2003). My code for the perfect slice sampler in the gist below appears to have a bug, but `mono_cftp_Ising`

should be working fine.

Matti Vihola (Jyväskylä)

Matti presented an importance sampling (IS) correction for MCMC chains that do not target the exact posterior distribution (arXiv 1609.02541). This has advantages over other “exact-approximate” methods, such as delayed acceptance (DA), because the IS correction can be performed offline, in parallel. This has great potential to be combined with Bayesian indirect likelihood (BIL) methods, such as my surrogate model for ABC or Gaussian process approximation for pseudo-marginal MCMC.

Murray Pollock (Warwick)

Murray presented some algorithms for combining inference from multiple, distributed sub-posteriors. The essential idea is to run a coupled Markov chain for each sub-posterior. When the chains coalesce, this gives an unbiased sample from the combined posterior distribution. These parallel methods are related to the SCALE algorithm (arXiv 1609.03436).

Tamara Broderick (MIT)

Tamara’s talk combined ideas from Comp. Sci. for dimension reduction with statistical algorithms for Bayesian inference. A *coreset* is a weighted subsample of the data, which is intended to provide a low-dimensional representation while minimising information loss. This can provide superior results over naïve subsampling methods, such as stochastic gradient descent (SGD) or stochastic gradient Langevin dynamics (SGLD). More details are available in her NIPS 2016 paper and the article homepage.

Helen Ogden (Southampton)

Helen presented some theoretical results for convergence of Laplace approximations to latent variable models and composite likelihood for the Ising model (arXiv 1601.07911). In both cases, she measures the approximation error using the distance from the score function of the true model. For the Laplace approximation, the number of observations needs to grow at a rate proportional to the number of latent variables. She also showed that the reduced dependence approximation (RDA) has polynomial computational cost when the inverse temperature β is below the critical value. Some of these ideas have been implemented in her R package **glmmsr** (available on CRAN).

Håvard Rue (KAUST)

This talk was particularly interesting for the discussion of diminishing returns from sparse matrix representations as dimension increases. Integrated, nested Laplace approximations (INLA) have enjoyed great success for approximate Bayesian inference on generalised linear models (GLM) or generalised additive models (GAM), particularly with 1D (HMM) or 2D (MRF) correlation structures.

Christian Robert (U. Paris Dauphine & U. Warwick)

Approximate Bayesian computation (ABC) is crucially dependent on the choice of distance function between the observations and pseudo-data. X’ian showed that the earth mover’s distance (EMD) or Wasserstein metric has some particularly useful properties in this context (arXiv 1701.05146). For some models, the Wasserstein metric can be computed directly from the parameter values, without any need for simulation of pseudo-data.

Kerrie Mengersen (QUT)

Kerrie discussed the issue of geographic transferability for ecological models. The key question is how much a model trained in one specific context can be generalised to other settings, for example as an informative prior, through a hierarchical model, or in the experimental design. Difficulties arise when covariates are missing in one location, or measured in a different way. Approaches include history matching for partially-informative priors (arXiv 1605.08860) and decision-theoretic subsampling of data (*Stat. Sci.* 2017).

The first problem that I ran into was that the gfortran 6.1 install package isn’t signed, so you can’t install it without admin access. Not a good start! For many people, this will be reason enough to avoid upgrading to R 3.4.x for the time being.

This comes with an unofficial version of clang-4.0.0-darwin15.6-Release.tar.gz, since the official release of clang 4.0.0 is only available for macOS 10.12 and above. It replaces Apple LLVM version 8.1.0 that I had previously installed with Xcode 8.3.2. These are the two versions:

$ /usr/local/clang4/bin/clang --version clang version 4.0.0 (tags/RELEASE_400/final) Target: x86_64-apple-darwin16.7.0 Thread model: posix InstalledDir: /usr/local/clang4/bin $ /usr/bin/clang --version Apple LLVM version 8.1.0 (clang-802.0.42) Target: x86_64-apple-darwin16.7.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Both versions of clang claim to support POSIX threads, but I know that the Apple LLVM version doesn’t support OpenMP pragma directives. I was also curious to see whether Stan and CUDA would continue to work with the new toolchain, since they have both been very fussy about compilers in the past.

I updated my ~/.R/Makevars as advised, to point to the new compilers:

CC=/usr/local/clang4/bin/clang CXX=/usr/local/clang4/bin/clang++ LDFLAGS=-L/usr/local/clang4/lib -fopenmp

Stan picked up the new version of clang:

/usr/local/clang4/bin/clang++ ... -I/usr/local/include -fPIC -mtune=core2 -O3 -fopenmp -c file5d687a875d0d.cpp ... <truncated>

However, I got some weird errors when I tried to run a Stan model:

Error in sampler$call_sampler(args_list[[i]]) : empty_nested() must be true before calling recover_memory()

I’m not sure what’s going on here. I tried installing both Rcpp and rstan from source, but haven’t had any luck.

CUDA 8.0.61 requires Xcode 8.2 (Apple LLVM 8.0.0) on macOS 10.12 (Sierra). As expected, this resulted in the following error message when compiling the R package **gputools** from source:

nvcc fatal : The version ('80100') of the host compiler ('Apple clang') is not supported

The final test was to try compiling both of my own R packages from source, **bayesImageS** and **serrsBayes**. I downloaded the source package bayesImageS_0.4-0 from CRAN and it compiled fine using clang 4.0.0. I ran the SMC-ABC example from the README and was pleased to see up to 800% CPU utilisation (indicating all 8 cores on my i7 were fully utilised). This indicates that OpenMP is working on clang 4.0.0, which would be a great reason to switch (if not for all of the other problems that I encountered!)

I was also able to compile the (pre-release) version of serrsBayes_0.3-4 and run the example from the package documentation. CPU utilisation was around 600% for fitSpectraSMC, which indicates that OpenMP is working for this R package as well.

]]>In other news, All 51 discussions (including mine) of “Beyond subjective and objective in statistics” by Gelman & Hennig (JRSS A, 2017) are now available online. Plenty of thoughtful commentary on the philosophy of science and statistics in particular.

SMC 2017 workshop, **Aug. 31 – Sept. 1**, 2017

Norrlands Nation, Uppsala Universitet

Raman spectroscopy can be used to identify molecules by the characteristic scattering of light from a laser. Each Raman-active dye label has a unique spectral signature, comprised by the locations and amplitudes of the peaks. The presence of a large, nonuniform background presents a major challenge to analysis of these spectra. We introduce a sequential Monte Carlo (SMC) algorithm to separate the observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian or Voigt functions, while the baseline is estimated using a penalised cubic spline. Our model-based approach accounts for differences in resolution and experimental conditions. We incorporate prior information to improve identifiability and regularise the solution. By utilising this representation in a Bayesian functional regression, we can quantify the relationship between molecular concentration and peak intensity, resulting in an improved estimate of the limit of detection. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. These methods have been implemented as an R package, using RcppEigen and OpenMP.

Contributed Session 6.5: “Big Data” (Methods & Theory)

2:30pm, **Wednesday Sept. 6**

Conference Room 6/7, Technology & Innovation Centre, University of Strathclyde

There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm and approximate Bayesian computation (ABC). A serious drawback of these algorithms is that they do not scale well for models with a large state space. Markov random fields, such as the Ising/Potts model and exponential random graph model (ERGM), are particularly challenging because the number of discrete variables increases linearly with the size of the image or graph. The likelihood of these models cannot be computed directly, due to the presence of an intractable normalising constant. In this context, it is necessary to employ algorithms that provide a suitable compromise between accuracy and computational cost.

Bayesian indirect likelihood (BIL) is a class of methods that approximate the likelihood function using a surrogate model. This model can be trained using a pre-computation step, utilising massively parallel hardware to simulate auxiliary variables. We review various types of surrogate model that can be used in BIL. In the case of the Potts model, we introduce a parametric approximation to the score function that incorporates its known properties, such as heteroskedasticity and critical temperature. We demonstrate this method on 2D satellite remote sensing and 3D computed tomography (CT) images. We achieve a hundredfold improvement in the elapsed runtime, compared to the exchange algorithm or ABC. Our algorithm has been implemented in the R package “bayesImageS,” which is available from CRAN.

]]>

To briefly recap, the mean function is characterised by the following differential equation:

which is a logistic curve with rate parameter and a horizontal asymptote at . The solution for an initial value is:

The observations are generated from a heteroskedastic, truncated normal distribution where the variance is equal to the gradient of the mean:

Simulating data from this model is straightforward, for example using the R package **rtruncnorm**:

This is what it looks like in the Stan modelling language (a derivative of BUGS):

functions { vector ft(vector t, real r, real y0, real tC, real L) { vector[num_elements(t)] mu; vector[num_elements(t)] exprt; exprt = exp(r*(t-tC)); mu = y0*L*exprt ./ (L + y0*(exprt - 1)); return mu; } vector dfdt(vector t, real r, real tC, real L) { vector[num_elements(t)] dmu; vector[num_elements(t)] sddenom; for (i in 1:num_elements(t)) { sddenom[i] = ((exp(r*tC) + exp(r*t[i]))^(-2)); } dmu = r*L*exp(r*(t+tC)) .* sddenom; return dmu; } } data { int<lower = 1> M; int<lower = 1> N; real<lower = 0> mu0; real<lower = 1> maxY; real tcrit; matrix<lower=0, upper=maxY>[M,N] y; vector[M] t; } parameters { real<lower = 0> r; } transformed parameters { vector[M] curr_mu; vector[M] curr_sd; curr_mu = ft(t, r, mu0, tcrit, maxY); curr_sd = dfdt(t, r, tcrit, maxY); } model { for (i in 1:M) { for (j in 1:N) { y[i,j] ~ normal(curr_mu[i], curr_sd[i]) T[0,maxY]; } } }

When we try to fit this model, Stan gives the following error:

[1] "The following numerical problems occured the indicated number of times after warmup on chain 1" count Exception thrown at line 28: normal_log: Scale parameter is 0, but must be > 0! 35

This indicates that there are problems with numerical precision in the tails of the distribution, as . To avoid this, I added an lower bound on the standard deviation:

curr_sd = dfdt(t, r, tcrit, maxY) + eps;

Even after fixing this problem, I still had the dreaded *divergent transitions*:

Warning messages: 1: There were 2155 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

I tried increasing adapt_delta as suggested, but I still got divergent transitions even at adapt_delta=0.99. Even adding a prior on r didn’t really help. In the end, I dropped the truncated normal distribution:

model { for (i in 1:M) { y[i,] ~ normal(curr_mu[i], curr_sd[i]); } }

Even though this meant that the model was slightly misspecified, the true value of r was now well within the region of highest posterior density:

Inference for Stan model: GrowthCurve4. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat r 1.98 0 0.02 1.95 1.97 1.98 2 2.02 3079 1 Samples were drawn using NUTS(diag_e) at Wed Apr 19 15:17:27 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).

I can modify the model so that both r and mu0 are treated as unknown parameters:

parameters { real<lower = 0> r; real<lower = 0> mu0; }

In this case, with r=2.5 and mu0=4, the true values are within the region of posterior support:

Inference for Stan model: GrowthCurve4. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat r 2.43 0.00 0.04 2.35 2.41 2.43 2.45 2.50 1761 1 mu0 3.80 0.01 0.30 3.24 3.59 3.79 3.99 4.42 1702 1 Samples were drawn using NUTS(diag_e) at Wed Apr 19 16:00:23 2017.

]]>