Title: Lorentzian mixture model for Raman spectroscopy
Speaker: Matt Moores (University of Warwick, UK)
Date: Friday 7 August 2015
Time: 3:00pm – 4:00pm
Location: O603, Gardens Point Campus, Queensland University of Technology
Abstract: Surface-enhanced resonance Raman scattering (SERRS) is an optical spectroscopic technique for identifying labelled molecules. A SERRS signal can be represented as a mixture of Lorentzian peaks plus a smoothly-varying baseline due to background fluorescence. The multivariate observations are thus highly collinear and lend themselves to a reduced-rank representation. We introduce a sequential Monte Carlo algorithm for joint estimation of the baseline and peaks. Our model-based approach accounts for batch effects between technical replicates. By analysing a large number of Raman-active dyes with similar chemical structure, we can characterise how structural differences are represented as changes in the spectral signature. Our model can be used for identification and quantification of dye labels in a multiplex, as well as hierarchical clustering of dyes according to structural similarity.
I was very glad that I was able to attend my PhD graduation ceremony in Brisbane last week. My extended family were there to cheer me on, as well as both of my supervisors and my dear friend, Kal. Out of 28 doctorates there were a dozen from maths, including four statisticians. Unfortunately none of us snagged an Outstanding Doctoral Thesis Award, but it was nice to be nominated!
Once again I would like to thank my supervisors Kerrie Mengersen & Fiona Harden for their mentorship, as well as my co-authors Cathy Hargrave, Chris Drovandi, Christian Robert, Tony Pettitt, Tim Deegan & Mike Poulsen for their contributions to my research.
I will be presenting a talk at the ACEMS International Workshop on Monte Carlo Methods for Spatial Stochastic Systems (MCMSS) at the University of Queensland, Brisbane, July 21-23 (abstract below). Other speakers include Gareth Roberts, Adrian Baddeley, Robert Kohn & Kevin Burrage. The workshop programme is now available online.
Moores, Pettitt & Mengersen (2015) “Scalable Bayesian Inference for the Inverse Temperature of a Hidden Potts Model” arXiv:1503.08066 [stat.CO]
Moores, Drovandi, Mengersen & Robert (2015) “Pre-processing for approximate Bayesian computation in image analysis”
Statistics & Computing 25(1): 23-33. DOI: 10.1007/s11222-014-9525-6
My R package, bayesImageS version 0.1-21, is now available online. It uses a hidden Potts model with additive Gaussian noise for image segmentation of 2D and 3D datasets. The latent labels z can be simulated using chequerboard updating or the Swendsen-Wang algorithm. Several methods for full Bayesian inference with intractable likelihoods are supported, including pseudolikelihood, the exchange algorithm, path sampling, and approximate Bayesian computation (ABC-MCMC & ABC-SMC).
The R source package is released under the GNU General Public License, version 2. Its computational engine is implemented in C++ using RcppArmadillo and OpenMP. There is zero documentation available at present, but I’m working on that…
The process for Australian doctoral candidates is different from Europe and elsewhere. Due mainly to geographic isolation, there is no “viva voce” where external examiners are invited to interrogate the candidate in person. Rather, there is a public seminar followed by a meeting with the university thesis committee. Once recommended changes to the thesis have been made, it it sent to the external examiners for review. So if you’ve recently received my thesis in your inbox, then thanks in advance!
Some of my thesis papers have recently appeared online:
“Pre-processing for approximate Bayesian computation in image analysis”
Moores, Drovandi, Mengersen & Robert (2015)
Statistics & Computing 25(1): 23-33. DOI: 10.1007/s11222-014-9525-6
“An external field prior for the hidden Potts model, with application to cone-beam computed tomography”
Moores, Hargrave, Deegan, Poulsen, Harden & Mengersen (2015)
Computational Statistics & Data Analysis 86: 27–41. DOI: 10.1016/j.csda.2014.12.001
Previously I listed all of the software that I usually install on Windows 7. Now that I’ve got an iMac with OS X Mavericks, it’s time to give an updated list:
From the App Store
Follow these instructions for installing the Command Line Tools (you don’t need a full install of Xcode on Mavericks)
From other websites
gcc and gfortran 4.9.0 from “High Performance Computing for Mac OS X”
- requires sudo to untar into /usr/local/bin
- don’t forget to edit your Makeconf using these instructions…
Problems with Gate Keeper
I’ve submitted a poster to the Network on Computational Statistics and Machine Learning (NCSML) workshop “Big Data, Big Models, it is a Big Deal” at the University of Warwick on the 1st & 2nd of September. More details are available from the workshop homepage. My abstract is as follows:
Scalable Bayesian computation for intractable likelihoods in image analysis
The availability of inexpensive, high-quality imaging has given scientists the capacity to generate more data than ever before. In medicine, some patients are scanned daily throughout their course of treatment, to monitor their progress as well as for image-guided therapies. Satellites such as Landsat and MODIS orbit the globe, regularly providing remotely-sensed imagery of the Earth’s surface. Automated methods of image analysis are vital in order to keep pace with the volumes of data that are generated in these settings. Increases in image resolution and sample depth have improved the quality of these images, but this has also resulted in a vast increase in the size of the digital representation. Many methods that were originally developed for much smaller images are infeasible for the image dimensions that are required by current applications. Thus, the scalability of automated methods to meet the needs of real world data is a major concern.
The hidden Potts model is widely applied in image analysis to segment the image pixels and label them according to their underlying classification. The inverse temperature parameter of this model governs the strength of spatial cohesion and therefore has a substantial influence over the resulting model fit. The difficulty arises from the dependence of an intractable normalising constant on the value of the inverse temperature, thus there is no closed form solution for sampling from the distribution directly. We review three computational approaches for addressing this issue, namely pseudolikelihood, path sampling, and the approximate exchange algorithm. We compare the accuracy and scalability of these methods using a simulation study.
This is joint work with Clair Alston and Kerrie Mengersen, Queensland University of Technology, Australia.