Talk at QUT on Friday August 7

simultaneous baseline correction and peak fitting of a SERRS spectrum using a SMC algorithm

Title: Lorentzian mixture model for Raman spectroscopy

Speaker: Matt Moores (University of Warwick, UK)

Date: Friday 7 August 2015

Time: 3:00pm – 4:00pm

Location: O603, Gardens Point Campus, Queensland University of Technology

Abstract: Surface-enhanced resonance Raman scattering (SERRS) is an optical spectroscopic technique for identifying labelled molecules. A SERRS signal can be represented as a mixture of Lorentzian peaks plus a smoothly-varying baseline due to background fluorescence. The multivariate observations are thus highly collinear and lend themselves to a reduced-rank representation. We introduce a sequential Monte Carlo algorithm for joint estimation of the baseline and peaks. Our model-based approach accounts for batch effects between technical replicates. By analysing a large number of Raman-active dyes with similar chemical structure, we can characterise how structural differences are represented as changes in the spectral signature. Our model can be used for identification and quantification of dye labels in a multiplex, as well as hierarchical clustering of dyes according to structural similarity.

This is joint work with Mark Girolami (Warwick), Kirsten Gracie, Karen Faulds and Duncan Graham (U. Strathclyde).

Graduation Ceremony

I was very glad that I was able to attend my PhD graduation ceremony in Brisbane last week. My extended family were there to cheer me on, as well as both of my supervisors and my dear friend, Kal. Out of 28 doctorates there were a dozen from maths, including four statisticians. Unfortunately none of us snagged an Outstanding Doctoral Thesis Award, but it was nice to be nominated!

Once again I would like to thank my supervisors Kerrie Mengersen & Fiona Harden for their mentorship, as well as my co-authors Cathy Hargrave, Chris Drovandi, Christian Robert, Tony Pettitt, Tim Deegan & Mike Poulsen for their contributions to my research.

QUT graduation ceremony, July 23, 2015

academic regalia

International Workshop on Monte Carlo Methods for Spatial Stochastic Systems

I will be presenting a talk at the ACEMS International Workshop on Monte Carlo Methods for Spatial Stochastic Systems (MCMSS) at the University of Queensland, Brisbane, July 21-23 (abstract below). Other speakers include Gareth Roberts, Adrian Baddeley, Robert Kohn & Kevin Burrage. The workshop programme is now available online.

I’ll also be giving a practice talk at the Warwick Young Researchers’ Meeting (YRM) on June 30 and presenting an invited talk for the QUT Mathematical Sciences School on August 7.

Scalable Inference for the Inverse Temperature of a Hidden Potts Model
The Potts model is a discrete Markov random field that can be used to label the pixels in an image according to an unobserved classification. The strength of spatial dependence between neighbouring labels is governed by the inverse temperature parameter. This parameter is difficult to estimate, due to its dependence on an intractable normalising constant. Several approaches have been proposed, including the exchange algorithm and approximate Bayesian computation (ABC), but these algorithms do not scale well for images with a million or more pixels. We introduce a precomputed binding function, which improves the elapsed runtime of these algorithms by two orders of magnitude. Our method enables fast, approximate Bayesian inference for computed tomography (CT) scans and satellite imagery.

This is joint work with Kerrie Mengersen, Tony Pettitt and Chris Drovandi at QUT, and Christian Robert at the University of Warwick and Université Paris Dauphine:

Moores, Pettitt & Mengersen (2015) “Scalable Bayesian Inference for the Inverse Temperature of a Hidden Potts Model” arXiv:1503.08066 [stat.CO]

Moores, Drovandi, Mengersen & Robert (2015) “Pre-processing for approximate Bayesian computation in image analysis”
Statistics & Computing
 25(1): 23-33. DOI: 10.1007/s11222-014-9525-6

R package bayesImageS

My R package, bayesImageS version 0.1-21, is now available online. It uses a hidden Potts model with additive Gaussian noise for image segmentation of 2D and 3D datasets. The latent labels z can be simulated using chequerboard updating or the Swendsen-Wang algorithm. Several methods for full Bayesian inference with intractable likelihoods are supported, including pseudolikelihood, the exchange algorithm, path sampling, and approximate Bayesian computation (ABC-MCMC & ABC-SMC).

The R source package is released under the GNU General Public License, version 2. Its computational engine is implemented in C++ using RcppArmadillo and OpenMP. There is zero documentation available at present, but I’m working on that…

Thesis submitted, articles in press

The process for Australian doctoral candidates is different from Europe and elsewhere. Due mainly to geographic isolation, there is no “viva voce” where external examiners are invited to interrogate the candidate in person. Rather, there is a public seminar followed by a meeting with the university thesis committee. Once recommended changes to the thesis have been made, it it sent to the external examiners for review. So if you’ve recently received my thesis in your inbox, then thanks in advance!

Some of my thesis papers have recently appeared online:

“Pre-processing for approximate Bayesian computation in image analysis”
Moores, Drovandi, Mengersen & Robert (2015)
Statistics & Computing
 25(1): 23-33. DOI: 10.1007/s11222-014-9525-6

“An external field prior for the hidden Potts model, with application to cone-beam computed tomography”
Moores, Hargrave, Deegan, Poulsen, Harden & Mengersen (2015)
Computational Statistics & Data Analysis 86: 27–41. DOI: 10.1016/j.csda.2014.12.001

Software Inventory for Mac OS X

Previously I listed all of the software that I usually install on Windows 7. Now that I’ve got an iMac with OS X Mavericks, it’s time to give an updated list:

From the App Store

Follow these instructions for installing the Command Line Tools (you don’t need a full install of Xcode on Mavericks)

xcode-select --install

From other websites

R 3.1.1
RStudio 0.98.1056
MacTeX 2014 (including TeX Live & TeXShop)
XQuartz 2.7.7 (X11 windowing system for OS X)
Skype 6
MS Office 2011
Stata/SE 14

gcc and gfortran 4.9.0 from “High Performance Computing for Mac OS X

  • requires sudo to untar into /usr/local/bin
  • don’t forget to edit your Makeconf using these instructions

Problems with Gate Keeper

JabRef 2.10
ImageJ 1.48

Big Data, Big Models, it is a Big Deal


I’ve submitted a poster to the Network on Computational Statistics and Machine Learning (NCSML) workshop “Big Data, Big Models, it is a Big Deal” at the University of Warwick on the 1st & 2nd of September. More details are available from the workshop homepage. My abstract is as follows:

Scalable Bayesian computation for intractable likelihoods in image analysis

The availability of inexpensive, high-quality imaging has given scientists the capacity to generate more data than ever before. In medicine, some patients are scanned daily throughout their course of treatment, to monitor their progress as well as for image-guided therapies. Satellites such as Landsat and MODIS orbit the globe, regularly providing remotely-sensed imagery of the Earth’s surface. Automated methods of image analysis are vital in order to keep pace with the volumes of data that are generated in these settings. Increases in image resolution and sample depth have improved the quality of these images, but this has also resulted in a vast increase in the size of the digital representation. Many methods that were originally developed for much smaller images are infeasible for the image dimensions that are required by current applications. Thus, the scalability of automated methods to meet the needs of real world data is a major concern.

The hidden Potts model is widely applied in image analysis to segment the image pixels and label them according to their underlying classification. The inverse temperature parameter of this model governs the strength of spatial cohesion and therefore has a substantial influence over the resulting model fit. The difficulty arises from the dependence of an intractable normalising constant on the value of the inverse temperature, thus there is no closed form solution for sampling from the distribution directly. We review three computational approaches for addressing this issue, namely pseudolikelihood, path sampling, and the approximate exchange algorithm. We compare the accuracy and scalability of these methods using a simulation study.

This is joint work with Clair Alston and Kerrie Mengersen, Queensland University of Technology, Australia.

