bayesImageS::mcmcPottsNoData(..)
This is a follow up to my previous post about the Swendsen-Wang (SW) algorithm, where I mentioned that SW has better convergence properties than Gibbs when the inverse temperature parameter β is large. This difference can be quantified by initialising the two algorithms at known starting points and measuring how many iterations it takes to converge. This is the second in a series of posts describing the functions and algorithms that I have implemented in the R package bayesImageS, which is now available on CRAN.
The Potts model has a doubly-intractable likelihood, so its expectation and variance cannot be computed exactly. Instead, we can use Markov chain Monte Carlo (MCMC) algorithms such as SW or Gibbs sampling to simulate from its distribution for a given value of β. However, we need to know how many MCMC iterations to use, so that the chain will have converged to a steady state. Otherwise, any inference using the MCMC samples will be biased.
In the following, the labels z of the Potts model can take k different values. This state space is not ordered, so algorithms such as perfect sampling (Propp & Wilson, 1996; Huber, 2016) cannot be applied. The Potts model is a member of the exponential family, so it has a sufficient statistic S(z) which is the count of like neighbours. The maximum value of S(z), which we will call M, is equal to 2(n − √n) for a regular, square lattice. For example, M = 112 for an 8×8 lattice; M = 31,000 for 125×125; and M = 1,998,000 for 1000×1000.
There are two exceptions where the distribution of the Potts model can be computed exactly. When β=0 the labels z are independent, hence the sufficient statistic S(z) follows a Binomial distribution with expectation M/k and variance M(1/k)(1 – 1/k). For an 8×8 lattice with k=3, the expectation is 37.33 with a variance of 24.89. As β approaches infinity, all of the labels have the same value almost surely. This means that the expectation approaches M asymptotically, while the variance approaches 0.
We can use the endpoints of the distribution to estimate how long the SW and Gibbs algorithms take to converge. The algorithm is initialised at one endpoint, then we monitor S(z) at each iteration until the distribution of the samples has converged to the known expectation and variance. First, let’s look at chequerboard Gibbs sampling for an 8×8 lattice with k=3:
library(PottsUtils) k <- 3 n <- 8*8 mask <- matrix(1,nrow=sqrt(n),ncol=sqrt(n)) neigh <- getNeighbors(mask, c(2,2,0,0)) block <- getBlocks(mask, 2) edges <- getEdges(mask, c(2,2,0,0)) print(paste(sum(mask),"pixels"))
## [1] "64 pixels"
print(paste("maximum sufficient statistic S(z) =",nrow(edges)))
## [1] "maximum sufficient statistic S(z) = 112"
library(bayesImageS) res.Gibbs <- mcmcPottsNoData(beta=5, k=3, neigh, block, niter=50) ts.plot(res.Gibbs$sum, ylim=c(nrow(edges)/3, nrow(edges))) abline(h=nrow(edges), col=2, lty=3)

summary(res.Gibbs$sum[26:50])
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 112 112 112 112 112 112
var(res.Gibbs$sum[26:50])
## [1] 0
We can see that it only takes around 25 iterations for the Gibbs sampler to converge for a lattice of that size. Now for a 125×125 lattice:
n <- 125*125 mask <- matrix(1,nrow=sqrt(n),ncol=sqrt(n)) neigh <- getNeighbors(mask, c(2,2,0,0)) block <- getBlocks(mask, 2) edges <- getEdges(mask, c(2,2,0,0)) print(paste(sum(mask),"pixels"))
## [1] "15625 pixels"
print(paste("maximum sufficient statistic S(z) =",nrow(edges)))
## [1] "maximum sufficient statistic S(z) = 31000"
res.Gibbs <- mcmcPottsNoData(beta=5, k=3, neigh, block, niter=2000) ts.plot(res.Gibbs$sum, ylim=c(nrow(edges)/3, nrow(edges))) abline(h=nrow(edges), col=2, lty=3)

summary(res.Gibbs$sum[1001:2000])
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 30740 30780 30810 30810 30830 30890
var(res.Gibbs$sum[1001:2000])
## [1] 1440.661
Even after 1000 iterations, the distribution of S(z) might not have converged to the known value. Now let’s see how Swendsen-Wang performs for the same lattice:
res.SW <- swNoData(beta=5, k=3, neigh, block, niter=50) ts.plot(res.SW$sum, ylim=c(nrow(edges)/3, nrow(edges))) abline(h=nrow(edges), col=2, lty=3)

summary(res.SW$sum[26:50])
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 31000 31000 31000 31000 31000 31000
var(res.SW$sum[26:50])
## [1] 0
After 25 iterations, SW has already converged to the exact distribution. Even though this algorithm is much more expensive for each iteration, it more than makes up for that in efficiency when β is large. Now let’s see what happens when we go in the other direction: initialising the lattice with all labels set to the same value, then updating with β=0:
res2.Gibbs <- mcmcPottsNoData(beta=0, k=3, neigh, block, niter=500, random=FALSE) ts.plot(res2.Gibbs$sum, ylim=range(c(res2.Gibbs$sum, nrow(edges)))) abline(h=nrow(edges), col=2, lty=3) abline(h=nrow(edges)/3, col=4, lty=3)

summary(res2.Gibbs$sum)
## V1 ## Min. : 9988 ## 1st Qu.:10281 ## Median :10340 ## Mean :10334 ## 3rd Qu.:10389 ## Max. :10522
var(res2.Gibbs$sum)
## [,1] ## [1,] 6710.494
The distribution of all 500 samples are very close to the exact distribution with mean 10333.33 and variance 6888.89:
hist(res2.Gibbs$sum, freq=FALSE, breaks=50, col=3)
abline(v=nrow(edges)/3, col=4, lty=3, lwd=3)
curve(dnorm(x, mean=nrow(edges)/3, sd=sqrt(nrow(edges)*(1/3)*(2/3))),
col="darkblue", lwd=2, add=TRUE, yaxt="n")

Now for Swendsen-Wang:
res2.SW <- swNoData(beta=0, k=3, neigh, block, niter=500, random=FALSE) ts.plot(res2.SW$sum, ylim=range(c(res2.SW$sum, nrow(edges)))) abline(h=nrow(edges), col=2, lty=3) abline(h=nrow(edges)/3, col=4, lty=3)

summary(res2.SW$sum)
## V1 ## Min. :10080 ## 1st Qu.:10273 ## Median :10327 ## Mean :10328 ## 3rd Qu.:10382 ## Max. :10542
var(res2.SW$sum)
## [,1] ## [1,] 6740.577
hist(res2.SW$sum, freq=FALSE, breaks=50, col=3)
abline(v=nrow(edges)/3, col=4, lty=3, lwd=3)
curve(dnorm(x, mean=nrow(edges)/3, sd=sqrt(nrow(edges)*(1/3)*(2/3))),
col="darkblue", lwd=2, add=TRUE, yaxt="n")

The distribution of the SW samples with β=0 is almost identical to what we obtained from the chequerboard Gibbs sampler and matches the exact distribution very closely. Based on these results, I would be confident in using 500 iterations of SW to simulate images of this size for any value of β. One might reasonably ask if there is any scenario where Gibbs sampling outperforms SW. The answer lies in the “NoData” part of the function name: in the presence of an external field, such as when fitting the Potts model to an observed image, the Gibbs sampler will have much better performance. This is due to the inhomogeneity of the distributions of each pixel.
References
Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images IEEE Trans. PAMI 6: 721-741.
Huber, M. (2016) Perfect SimulationChapman & Hall/CRC Press
Moores, M.T.; Pettitt, A.N. & Mengersen, K. (2015) Scalable Bayesian Inference for the Inverse Temperature of a Hidden Potts Model arXiv preprint arXiv:1503.08066 [stat.CO]
Moores, M.T. & Mengersen, K. (2016) bayesImageS: Bayesian Methods for Image Segmentation using a Potts Model R package v0.3-4
Propp, J. G. & Wilson, D. B. (1996) Exact sampling with coupled Markov chains and applications to statistical mechanics Random Struct. Algor. 9(1-2): 223-252.
Swendsen, R.H. & Wang, J-S (1987) Nonuniversal critical dynamics in Monte Carlo simulations Phys. Rev. Lett. 58(2): 86–88.
Trackbacks & Pingbacks