The main reason that I use **hyperSpec** is for reading spectra in a variety of proprietary file formats, such as Thermo Galactic/Grams .spc files, as well as Perkin Elmer, Horiba, Cytospec, Shimadzu and other spectrometer manufacturers. This was a major reason why I originally chose R over alternatives like Python or MATLAB for working with spectroscopy datasets. The package is outstanding quality, very easy to use, with excellent documentation.

I’d like to thank the package authors, Claudia Beleites and Valter Sergo, as well as all of the contributors: Alois Bonifacio, Marcel Dahms, Björn Egert, Simon Fuller, Vilmantas Gegzna, Rustam Guliev, Michael Hermes, Martin Kammer, Roman Kiselev, Sebastian Mellor, and of course Bryan Hanson. After all of the hours they’ve put into this software, I can appreciate how frustrating this must be. **hyperSpec** has certainly saved me many hours of work with its convenient functions for importing and plotting data.

I had a similar situation a while back with my other R package **bayesImageS**, where a package that I depended on was suddenly yoinked from CRAN. For some reason, this always seems to happen around Christmas and New Year’s Eve, when most package maintainers are unlikely to be checking their work email and hence least able to respond on short notice.

In any case, I have now removed **hyperSpec** from the list of suggested packages, even though I am unable to suggest a suitable replacement! I’ve installed version 0.99-20200115 from the CRAN archive and it is working fine. I’ve also updated the vignettes accordingly. A new version 0.4-1 of **serrsBayes** is now available on CRAN. Release notes are available here.

The New Yorker cartoon is by Edward Steed, with caption by whyDoesR?

]]>

James, Witten, Hastie & Tibshirani (2013) “An Introduction to Statistical Learning, with Applications in R” Springer.

Thomas (2018) “Mathematics for Machine Learning”

Irizarry (2019) “Introduction to Data Science: Data Analysis and Prediction Algorithms with R”

Welling (2010) “A First Encounter with Machine Learning”

Daumé III (2017) “A Course in Machine Learning”

Wickham & Grolemund (2017) “R for Data Science: Import, Tidy, Transform, Visualize, and Model Data” O’Reilly.

Wickham (2nd ed., 2019) “Advanced R” Chapman & Hall/CRC Press.

Wickham (2nd ed., 2015) “ggplot2: Elegant Graphics for Data Analysis”

Lovelace, Nowosad & Muenchow (2019) “Geocomputation with R” CRC Press.

Downey (2nd ed., 2014) “ThinkStats: Exploratory Data Analysis in Python” O’Reilly.

Adhikari & DeNero “Computational and Inferential Thinking: The Foundations of Data Science”

Sklearn basics (Jupyter notebook)

Plotting and Visualization in Python (Jupyter notebook)

Hastie, Tibshirani & Friedman (2nd ed., 2009) “The Elements of Statistical Learning: Data Mining, Inference, and Prediction” Springer.

Goodfellow, Bengio & Courville (2016) “Deep Learning” MIT Press.

Peyré (2019) “Mathematical Foundations of Data Sciences”

McElreath (2015; 2nd ed. 2020) “Statistical Rethinking: A Bayesian Course with Examples in R and Stan” Chapman & Hall/CRC Press. YouTube videos

Wikle, Zammit-Mangion & Cressie (2019) “Spatio-Temporal Statistics with R” Chapman & Hall/CRC Press.

Collins II (2003) “Fundamental Numerical Methods and Data Analysis”

Leskovec, Rajaraman & Ullman (3rd ed., 2020) “Mining of Massive Datasets” CUP.

Hyndman & Athanasopoulos (2nd ed., 2018) “Forecasting: Principles and Practice” OTexts.

Blitzstein & Hwang (2nd ed., 2019) “Introduction to Probability” CRC Press.

Petersen & Pedersen (2012) “The Matrix Cookbook”

fast.ai (Jeremy Howard & Rachel Thomas)

Deep Learning Specialization (Andrew Ng, Coursera)

Intro to Hadoop and MapReduce (Udacity)

Statistical Learning (Trevor Hastie & Rob Tibshirani, Stanford Online)

Linear Algebra (Gilbert Strang, MIT OCW)

]]>

Tuesday, June 25 at 5:15 for a 6pm start

College of Business and Economics, Australian National University, Canberra, ACT

The planned Mars 2020 mission to Jezero Crater will include a rover equipped with 2 Raman spectrometers: SuperCam and SHERLOC. This would be the first time that this type of spectroscopy has been performed on the Martian surface, which will enable new kinds of analysis of minerals and organic molecules. In the meantime, the Mars Science Laboratory Curiosity rover continues to build on the massive dataset of laser-induced breakdown spectroscopy (LIBS) that it has been accumulating since 2012. These data pose particular challenges for statistical signal processing, since pre-flight calibration on Earth can only approximate Martian environmental conditions. Analytical methods must be robust to artefacts and other changes in the spectral profile, such as nonlinear interactions between signals. This talk will introduce a Bayesian method for source separation of spectroscopy. We derive informative priors from online databases of known reference spectra, as well as quantum-mechanical computer models. The components of the combined spectrum are identified and quantified using a sequential Monte Carlo algorithm. An open-source implementation of our method is available in the R package ‘**serrsBayes**.’

Tuesday, July 9 at 12pm

Business School, University of Technology, Sydney, NSW

The Potts model is commonly used for classification, where the labels are spatially-correlated. The strength of spatial association is governed by a smoothing parameter, known as the inverse temperature. A difficulty arises from the dependence of an intractable normalising constant on the value of this parameter, thus there is no closed-form solution for sampling from the posterior distribution directly. There are a variety of Markov chain Monte Carlo methods for sampling from the posterior without evaluating the normalising constant, including the exchange algorithm and approximate Bayesian computation (ABC). A serious drawback of these algorithms is that they do not scale well for models with a large state space, such as images with a million or more pixels. In this talk, I will introduce the parametric functional approximate Bayesian (PFAB) algorithm, which uses an integral curve to approximate the score function of the Potts model. PFAB incorporates known properties of the likelihood, such as heteroskedasticity and critical temperature. I will demonstrate this method using synthetic data as well as remotely-sensed imagery from the Landsat-8 satellite. The proposed algorithm achieves up to a hundredfold improvement in the elapsed runtime, compared to the exchange algorithm or ABC. An open source implementation of PFAB is available in the R package ‘**bayesImageS**.’

We are pleased to announce two upcoming talks by Dr Anthony Lee (Senior Lecturer from the University of Bristol): Tuesday, July 2 at QUT and Thursday, July 18 at Monash University. The call for abstracts has now opened for Bayes on the Beach. 250 word abstracts can be submitted by email to bob.admin@qut.edu.au before August 16. We also mention some other upcoming conferences: MCM 2019, July 8-12 in Sydney;EAC-ISBA 2019, July 13-14 in Kobe, Japan; BayesComp 2020, January 7-10 in Florida, USA; and ABC in Grenoble, March 19-20 in France. __Read more here__.

There are now two example Raman spectra available in the R package, methanol and TAMRA, as well as a new vignette.

The vignette explains the main differences between the 3 functions fitSpectraMCMC, fitSpectraSMC, and fitVoigtPeaksSMC, and how to choose which function to use in a given situation. The methanol spectrum was kindly provided by my co-author, Professor Karen Faulds. It only has 4 (maybe 5?) peaks, so it is a bit easier to see what is going on than the TAMRA example in the first vignette.

I’m currently working on a new function that uses the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). Expect that to be available in the R package later in the year.

]]>

**When:** 11am, Wednesday December 5

**Where:** Building 39A, Room 208, University of Wollongong, NSW (main campus)

**Speaker:** A/Prof Mirko Draca, Department of Economics, University of Warwick, UK

**Abstract:**

Strong evidence has been emerging that major democracies have become more politically polarized, at least according to measures based on the ideological positions of political elites. We ask: have the general public (‘citizens’) followed the same pattern? Our approach is based on unsupervised machine learning models as applied to issue- position survey data. This approach firstly indicates that coherent, latent ideologies are strongly apparent in the data, with a number of major, stable types that we label as: Liberal Centrist, Conservative Centrist, Left Anarchist and Right Anarchist. Using this framework, and a resulting measure of ‘citizen slant’, we are then able to decompose the shift in ideological positions across the population over time. Specifically, we find evidence of a ‘disappearing center’ in a range of countries with citizens shifting away from centrist ideologies into anti-establishment ‘anarchist’ ideologies over time. This trend is especially pronounced for the US.

This is joint work with Carlo Schwarz (University of Warwick)

**When: **Wednesday December 12

**Where:** P504, QUT Gardens Point Campus, George St, Brisbane QLD

**Speaker:** Dr Matt Moores, Lecturer in Statistical Science, University of Wollongong

**Abstract:**

The hidden Potts model can be used for image segmentation, where the pixels are assumed to be noisy observations of some hidden states. The inverse temperature parameter governs the strength of spatial cohesion between neighbours in the image lattice. A difficulty arises from the dependence of an intractable normalising constant on the value of this parameter and thus there is no closed-form solution for sampling from the posterior distribution directly. There are a variety of computational approaches for sampling from the posterior without evaluating the normalising constant, including the exchange algorithm and approximate Bayesian computation (ABC). A serious drawback of these algorithms is that they do not scale well for models with a large state space, such as images with a million or more pixels. In this talk, I will introduce the parametric functional approximate Bayesian (PFAB) algorithm, which uses an integral curve to approximate the score function. PFAB incorporates known properties of the likelihood, such as heteroskedasticity and critical temperature. I will demonstrate this method using synthetic data as well as remotely-sensed imagery from the Landsat-8 satellite. The proposed algorithm achieves up to a hundredfold improvement in the elapsed runtime, compared to the exchange algorithm or ABC. An open source implementation of PFAB is available in the R package `bayesImageS’.

This is joint work with Kerrie Mengersen & Tony Pettitt (QUT) and Geoff Nicholls (Oxford).

]]>

The spectral signature of a molecule can be predicted using a quantum-mechanical model, such as time-dependent density functional theory (TD-DFT). However, there are no uncertainty estimates associated with these predictions, and matching with peaks in observed spectra is often performed by eye. This talk introduces a model-based approach for baseline estimation and peak fitting, using TD-DFT predictions as an informative prior. The peaks are modelled as a mixture of Lorentzian, Gaussian, or pseudo-Voigt broadening functions, while the baseline is represented as a penalised cubic spline. We fit this model using a sequential Monte Carlo (SMC) algorithm, which is robust to local maxima and enables the posterior distribution to be incrementally updated as more data becomes available. We apply our method to multivariate calibration of Raman-active dye molecules, enabling us to estimate the limit of detection (LOD) of each peak.

]]>

The Potts (1952) model is an example of a Gibbs random field on a regular lattice, where each node can take values in the set . The Ising model can be viewed as a special case, when *q*=2. The size of the configuration space is therefore , where n is the number of nodes. The dual lattice defines undirected edges between neighbouring nodes . If the nodes in a 2D lattice with c columns are indexed row-wise, the nearest (first-order) neighbours are , except at the boundary. Nodes situated on the boundary of the domain have less than four neighbours. The total number of unique edges is thus for a square lattice, or if the lattice is rectangular.

The sufficient statistic of the Potts model is the sum of all like neighbour pairs:

where is the Kronecker delta function, which equals 1 if a = b and equals 0 otherwise. ranges from 0, when all of the nodes form a chequerboard pattern, to when all of the nodes have the same value. The likelihood of the Potts model is thus:

The normalising constant of the Potts model is intractable for any non-trivial lattice, since it requires a sum over the configuration space:

When the inverse temperature , simplifies to , hence the labels are independent and uniformly-distributed.

The sum over configuration space of the sufficient statistic of the

q-state Potts model on a rectangular 2D lattice is.

For a *q*=2 state Potts model on a lattice with *n*=4 nodes and edges, contains 16 possible configurations:

. This can also be written as .

Now consider a rectangular lattice with *r* > 1 rows and *c* > 1 columns, so that and the dual lattice . The size of the configuration space is . Assume that the sum over configuration space is equal to . This sum can be decomposed into within each row, plus between rows.

If this lattice is extended by adding another row (or equivalently, another column), then (or otherwise, ) and the dual lattice . The nodes in this new row can take possible values, so the size of the configuration space is now . will increase proportional to for the new row, plus for the connections with its adjacent row:

Q.E.D.

The expectation of the

q-state Potts model on a rectangular 2D lattice is when the inverse temperature .

The proof follows from Theorem 1 by noting that and hence:

Q.E.D.

The sum over configuration space of the square of the sufficient statistic of the

q-state Potts model on a rectangular 2D lattice is

For a *q*=2 state Potts model on a lattice with *n*=4 nodes and edges, . This can also be written as .

Now assume for a rectangular lattice with *r *> 1 rows and *c* > 1 columns that

This can be decomposed into .

If we extend the lattice by adding another row, then

Q.E.D.

The variance of the

q-state Potts model on a rectangular 2D lattice is when the inverse temperature .

The proof follows from Theorems 1 and 3:

Q.E.D.

]]>

There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.

]]>

- Google Chrome
- Dropbox
- TeX Live 2018 (including TexWorks)
- Java SE JDK 8u171 with NetBeans 8.2 IDE (64 bit)
- JabRef (64 bit)
- Microsoft R Open 3.5.0 (including MKL)
- Rtools34
- RStudio Desktop 1.1.453
- Microsoft Office Home & Student 2016 (64 bit)
- IBM SPSS Statistics 23
- Adobe Acrobat Reader DC
- PuTTY 0.7 (64 bit)
- WinSCP 5.13.3
- Git 2.18.0 for Windows (64 bit)

With all of this installed and my Dropbox synced, I have 197 GB of free space on my 446 GB solid-state drive.

]]>

This will be my farewell tour of the UK, as I’ll be relocating back to Australia after an amazing four years as a postdoc at the University of Warwick. After UseR!, I’ll be taking up a lectureship in the School of Mathematics and Statistics and the National Institute for Applied Statistics Research Australia (NIASRA) at the University of Wollongong.

ABC in Edinburgh, Sunday June 24

The inverse temperature parameter of the Potts model governs the strength of spatial cohesion and therefore has a major influence over the resulting model fit. A difficulty arises from the dependence of an intractable normalising constant on the value of this parameter and thus there is no closed-form solution for sampling from the posterior distribution directly. There are a variety of computational approaches for sampling from the posterior without evaluating the normalising constant, including the exchange algorithm and approximate Bayesian computation (ABC). A serious drawback of these algorithms is that they do not scale well for models with a large state space, such as images with a million or more pixels. We introduce a parametric surrogate model, which approximates the score function using an integral curve. Our surrogate model incorporates known properties of the likelihood, such as heteroskedasticity and critical temperature. We demonstrate this method using synthetic data as well as remotely-sensed imagery from the Landsat-8 satellite. We achieve up to a hundredfold improvement in the elapsed runtime, compared to the exchange algorithm or ABC. An open source implementation of our algorithm is available in the R package `bayesImageS’.

Moores, Pettitt & Mengersen (2015; v2 2018) “Scalable Bayesian inference for the inverse temperature of a hidden Potts model” arXiv:1503.08066 [stat.CO]

ISBA World Meeting, University of Edinburgh, Monday June 25

Raman spectroscopy can be used to identify molecules by the characteristic scattering of light from a laser. Each Raman-active dye label has a unique spectral signature, comprised by the locations and amplitudes of the peaks. The presence of a large, non-uniform background presents a major challenge to analysis of these spectra. We introduce a sequential Monte Carlo (SMC) algorithm to separate the observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. The peaks are modelled as Lorentzian, Gaussian, or pseudo-Voigt functions, while the baseline is estimated using a penalised cubic spline. Our model-based approach accounts for differences in resolution and experimental conditions. We incorporate prior information to improve identifiability and regularise the solution. By utilising this representation in a Bayesian functional regression, we can quantify the relationship between molecular concentration and peak intensity, resulting in an improved estimate of the limit of detection. The posterior distribution can be incrementally updated as more data becomes available, resulting in a scalable algorithm that is robust to local maxima. These methods have been implemented as an R package, using RcppEigen and OpenMP.

Moores, Gracie, Carson, Faulds, Graham & Girolami (2016; v2 2018) “Bayesian modelling and quantification of Raman spectroscopy” arXiv:1604.07299 [stat.AP]

]]>