Skip to content

bayesImageS::mcmcPottsNoData(..)

January 17, 2017

This is a follow up to my previous post about the Swendsen-Wang (SW) algorithm, where I mentioned that SW has better convergence properties than Gibbs when the inverse temperature parameter β is large. This difference can be quantified by initialising the two algorithms at known starting points and measuring how many iterations it takes to converge. This is the second in a series of posts describing the functions and algorithms that I have implemented in the R package bayesImageS, which is now available on CRAN.

The Potts model has a doubly-intractable likelihood, so its expectation and variance cannot be computed exactly. Instead, we can use Markov chain Monte Carlo (MCMC) algorithms such as SW or Gibbs sampling to simulate from its distribution for a given value of β. However, we need to know how many MCMC iterations to use, so that the chain will have converged to a steady state. Otherwise, any inference using the MCMC samples will be biased.

In the following, the labels z of the Potts model can take k different values. This state space is not ordered, so algorithms such as perfect sampling (Propp & Wilson, 1996; Huber, 2016) cannot be applied. The Potts model is a member of the exponential family, so it has a sufficient statistic S(z) which is the count of like neighbours. The maximum value of S(z), which we will call M, is equal to 2(− n) for a regular, square lattice. For example, M = 112 for an 8×8 lattice; M = 31,000 for 125×125; and M = 1,998,000 for 1000×1000.

There are two exceptions where the distribution of the Potts model can be computed exactly. When β=0 the labels z are independent, hence the sufficient statistic S(z) follows a Binomial distribution with expectation M/k and variance M(1/k)(1 – 1/k). For an 8×8 lattice with k=3, the expectation is 37.33 with a variance of 24.89. As β approaches infinity, all of the labels have the same value almost surely. This means that the expectation approaches M asymptotically, while the variance approaches 0.

We can use the endpoints of the distribution to estimate how long the SW and Gibbs algorithms take to converge. The algorithm is initialised at one endpoint, then we monitor S(z) at each iteration until the distribution of the samples has converged to the known expectation and variance. First, let’s look at chequerboard Gibbs sampling for an 8×8 lattice with k=3:

library(PottsUtils)

k <- 3
n <- 8*8
mask <- matrix(1,nrow=sqrt(n),ncol=sqrt(n))
neigh <- getNeighbors(mask, c(2,2,0,0))
block <- getBlocks(mask, 2)
edges <- getEdges(mask, c(2,2,0,0))
print(paste(sum(mask),"pixels"))
## [1] "64 pixels"
print(paste("maximum sufficient statistic S(z) =",nrow(edges)))
## [1] "maximum sufficient statistic S(z) = 112"
library(bayesImageS)

res.Gibbs <- mcmcPottsNoData(beta=5, k=3, neigh, block, niter=50)
ts.plot(res.Gibbs$sum, ylim=c(nrow(edges)/3, nrow(edges)))
abline(h=nrow(edges), col=2, lty=3)

unnamed-chunk-1-1

summary(res.Gibbs$sum[26:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##     112     112     112     112     112     112
var(res.Gibbs$sum[26:50])
## [1] 0

We can see that it only takes around 25 iterations for the Gibbs sampler to converge for a lattice of that size. Now for a 125×125 lattice:

n <- 125*125
mask <- matrix(1,nrow=sqrt(n),ncol=sqrt(n))
neigh <- getNeighbors(mask, c(2,2,0,0))
block <- getBlocks(mask, 2)
edges <- getEdges(mask, c(2,2,0,0))
print(paste(sum(mask),"pixels"))
## [1] "15625 pixels"
print(paste("maximum sufficient statistic S(z) =",nrow(edges)))
## [1] "maximum sufficient statistic S(z) = 31000"
res.Gibbs <- mcmcPottsNoData(beta=5, k=3, neigh, block, niter=2000)
ts.plot(res.Gibbs$sum, ylim=c(nrow(edges)/3, nrow(edges)))
abline(h=nrow(edges), col=2, lty=3)

unnamed-chunk-2-1

summary(res.Gibbs$sum[1001:2000])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   30740   30780   30810   30810   30830   30890
var(res.Gibbs$sum[1001:2000])
## [1] 1440.661

Even after 1000 iterations, the distribution of S(z) might not have converged to the known value. Now let’s see how Swendsen-Wang performs for the same lattice:

res.SW <- swNoData(beta=5, k=3, neigh, block, niter=50)
ts.plot(res.SW$sum, ylim=c(nrow(edges)/3, nrow(edges)))
abline(h=nrow(edges), col=2, lty=3)

unnamed-chunk-3-1

summary(res.SW$sum[26:50])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   31000   31000   31000   31000   31000   31000
var(res.SW$sum[26:50])
## [1] 0

After 25 iterations, SW has already converged to the exact distribution. Even though this algorithm is much more expensive for each iteration, it more than makes up for that in efficiency when β is large. Now let’s see what happens when we go in the other direction: initialising the lattice with all labels set to the same value, then updating with β=0:

res2.Gibbs <- mcmcPottsNoData(beta=0, k=3, neigh, block, niter=500, random=FALSE)
ts.plot(res2.Gibbs$sum, ylim=range(c(res2.Gibbs$sum, nrow(edges))))
abline(h=nrow(edges), col=2, lty=3)
abline(h=nrow(edges)/3, col=4, lty=3)

unnamed-chunk-4-1

summary(res2.Gibbs$sum)
##        V1
##  Min.   : 9988
##  1st Qu.:10281
##  Median :10340
##  Mean   :10334
##  3rd Qu.:10389
##  Max.   :10522
var(res2.Gibbs$sum)
##          [,1]
## [1,] 6710.494

The distribution of all 500 samples are very close to the exact distribution with mean 10333.33 and variance 6888.89:

hist(res2.Gibbs$sum, freq=FALSE, breaks=50, col=3)
abline(v=nrow(edges)/3, col=4, lty=3, lwd=3)
curve(dnorm(x, mean=nrow(edges)/3, sd=sqrt(nrow(edges)*(1/3)*(2/3))),
          col="darkblue", lwd=2, add=TRUE, yaxt="n")

unnamed-chunk-5-1

Now for Swendsen-Wang:

res2.SW <- swNoData(beta=0, k=3, neigh, block, niter=500, random=FALSE)
ts.plot(res2.SW$sum, ylim=range(c(res2.SW$sum, nrow(edges))))
abline(h=nrow(edges), col=2, lty=3)
abline(h=nrow(edges)/3, col=4, lty=3)

unnamed-chunk-6-1

summary(res2.SW$sum)
##        V1
##  Min.   :10080
##  1st Qu.:10273
##  Median :10327
##  Mean   :10328
##  3rd Qu.:10382
##  Max.   :10542
var(res2.SW$sum)
##          [,1]
## [1,] 6740.577
hist(res2.SW$sum, freq=FALSE, breaks=50, col=3)
abline(v=nrow(edges)/3, col=4, lty=3, lwd=3)
curve(dnorm(x, mean=nrow(edges)/3, sd=sqrt(nrow(edges)*(1/3)*(2/3))),
          col="darkblue", lwd=2, add=TRUE, yaxt="n")

unnamed-chunk-7-1

The distribution of the SW samples with β=0 is almost identical to what we obtained from the chequerboard Gibbs sampler and matches the exact distribution very closely. Based on these results, I would be confident in using 500 iterations of SW to simulate images of this size for any value of β. One might reasonably ask if there is any scenario where Gibbs sampling outperforms SW. The answer lies in the “NoData” part of the function name: in the presence of an external field, such as when fitting the Potts model to an observed image, the Gibbs sampler will have much better performance. This is due to the inhomogeneity of the distributions of each pixel.

References

Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images IEEE Trans. PAMI 6: 721-741.

Huber, M. (2016) Perfect SimulationChapman & Hall/CRC Press

Moores, M.T.; Pettitt, A.N. & Mengersen, K. (2015) Scalable Bayesian Inference for the Inverse Temperature of a Hidden Potts Model arXiv preprint arXiv:1503.08066 [stat.CO]

Moores, M.T. & Mengersen, K. (2016) bayesImageS: Bayesian Methods for Image Segmentation using a Potts Model R package v0.3-4

Propp, J. G. & Wilson, D. B. (1996) Exact sampling with coupled Markov chains and applications to statistical mechanics Random Struct. Algor. 9(1-2): 223-252.

Swendsen, R.H. & Wang, J-S (1987) Nonuniversal critical dynamics in Monte Carlo simulations Phys. Rev. Lett. 58(2): 86–88.

Advertisements

From → R

One Comment

Trackbacks & Pingbacks

  1. Coupling from the Past | Matt Moores

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Richard Everitt's blog

Computational Bayesian statistics

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

Mad (Data) Scientist

Musings, useful code etc. on R and data science

R-bloggers

R news and tutorials contributed by (750) R bloggers

Another Astrostatistics Blog

The random musings of a reformed astronomer ...

Darren Wilkinson's research blog

Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics

CHANCE

Computational Bayesian statistics

StatsLife - Significance magazine

Computational Bayesian statistics

(badness 10000)

Computational Bayesian statistics

Igor Kromin

Computational Bayesian statistics

Statisfaction

I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science

Bayesian Research & Applications Group

Frontier Research in Bayesian Methodology & Computation

%d bloggers like this: