Since I’ve been invited to give a seminar at an OxWaSP mini-symposium, I decided it was finally time to get my R package bayesImageS in shape for submission to the CRAN repository. Recently, the R-hub builder was released as a public beta. I was keen to check this out, since it is meant to make the process of submitting packages to CRAN simpler, particularly for first time package authors such as myself.

These instructions are for OS X Mavericks. Upgrading to el Capitan will almost certainly prevent this from working, due to the system integrity protection features that were introduced.

Clang 3.8.0 supports OpenMP, but the version of clang included with the Xcode command-line tools does not. Since clang is the default compiler for R on OS X, R packages that use OpenMP directives will run in parallel on Linux or Windows, but single-threaded on Mac OS. Both of my R packages, serrsBayesbayesImageS, were written with Rcpp using OpenMP, so this is a problem for me. I do most of my heavy computation on a Linux cluster, but I develop and test a lot of code on my iMac. I’d really like my code to make full use of the multicore i7 CPU, rather than taking 5-8 times longer to run.

There are two main options for compiling with OpenMP on Mac OS: installing the latest Clang from LLVM, or installing gcc and replacing the symbolic link in /usr/bin. When I first encountered this issue, LLVM clang still didn’t support all of the #pragma omp directives, so I chose the second option. You still need to install gcc to get gfortran support on OS X, so that seemed like the best option. However, in Mac OS 10.11 (el Capitan), /usr/bin is read only.

Whichever option you choose, you then need to add the following lines to your personal or site-specific Makevars file:

CC=/usr/local/bin/gcc
CXX=/usr/local/bin/g++ -arch x86_64 -ftemplate-depth-256
PKG_LIBS = \$(R_HOME)/bin/Rscript -e "Rcpp:::LdFlags()" -fopenmp
CXXFLAGS += -fopenmp

Warning: it is definitely a bad idea to add these lines to an R package Makevars, since it will break the build for any Mac OS user who compiles with clang instead of gcc (sorry, Jake…)

Single-threaded, my Metropolis-Hastings algorithm takes 45 seconds for 10,000 iterations. When I compile with OpenMP enabled, it takes 27 sec. Not a huge speedup perhaps, but it makes a big difference for longer-running computation.

Note that some R packages, such as rstan, modify the personal Makevars file. If OpenMP suddenly stops working, that could be the culprit. Check ~/.R/Makevars and make sure that the above settings are still intact.

Igor recently posted some interesting data, showing the change in temperature over time with 2 different types of fan, or without any cooling. This data exhibits some unusual features: the underlying trend is smooth, over a long timeframe (minutes) but there is also periodic switching between “low” and “high” signal, which is sometimes accompanied by jitter. I’ll first show how to generate some similar data in R, then how we might compute the discrete wavelet transform (DWT) in Google Sheets. Of course, the simplest solution is often the best (unless you work in research…), so check out Igor’s post for a quicker way to filter this data.

Following up on my previous post, a commenter on Reddit asked to see a comparison with Microsoft R Open using multithreaded Intel MKL:

Microsoft R Open 3.2.5
Default CRAN mirror snapshot taken on 2016-05-01
The enhanced R distribution from Microsoft
Visit https://mran.microsoft.com/ for information

Multithreaded BLAS/LAPACK libraries detected. Using 4 cores for math algorithms.

RcppEigen, RcppArmadillo & RcppGSL were all installed from MRAN as binary packages. As expected, MKL outperforms GotoBLAS:

lm benchmark for n = 1650 and p = 875: nrep = 20
user system elapsed
1908.28  27.55 1870.58

test relative elapsed user.self sys.self
3  LDLt    1.000    4.61      4.44     0.18
7  QR      1.386    6.39      6.12     0.25
8  LLt     1.460    6.73      6.42     0.32
1  lm.fit  1.631    7.52      7.40     0.08
9  arma    2.206   10.17     37.87     2.70
4  GESDD   2.560   11.80     43.82     2.86
6  SymmEig 5.514   25.42     25.14     0.26
2  PivQR   9.857   45.44     25.26    20.15
5  SVD   188.666  869.75    868.87     0.42
10 GSL   191.262  881.72    881.45     0.27

There are none of the multi-threading issues that I observed in the previous benchmark. Elapsed time for lm.fit improves from 18.11 to 7.52, arma improves from 90.03 to 10.17, and GESDD improves from 90.77 to 11.80 seconds (in comparison to default libRblas). Compared with single-threaded SurviveGotoBLAS, MKL gives a 3x speedup for RcppArmadillo and GESDD. The other methods all perform roughly the same, since they do not rely on the BLAS library.

The research that I presented at Oxford and QUT is now available on arXiv. The abstract is below:

Raman spectroscopy is a technique for detecting and identifying molecules such as DNA. It is sensitive at very low concentrations and can accurately quantify the amount of a given molecule in a sample. The presence of a large, nonuniform background presents a major challenge to analysis of these spectra. We introduce a sequential Monte Carlo (SMC) algorithm to separate the observed spectrum into a series of peaks plus a smoothly-varying baseline, corrupted by additive white noise. Our model-based approach accounts for differences in resolution and experimental conditions. By incorporating this representation into a Bayesian functional regression, we can quantify the relationship between molecular concentration and peak intensity, resulting in an improved estimate of the limit of detection. We also calculate the model evidence using SMC to investigate long-range dependence between peaks.

This is joint work with Mark Girolami (Warwick & ATI), Jake Carson (Warwick), Kirsten Gracie, Karen Faulds & Duncan Graham (Strathclyde).

I’ve been invited to present my work on sequential Monte Carlo methods for Raman spectroscopy at the Oxford Computational Statistics and Machine Learning Reading Group (OCSMLRG), 11am on Friday 11 March. I’ve made some good progress since my seminar at QUT last year, so I’m looking forward to presenting these methods for a new audience. The abstract of my talk is below.

Musings, useful code etc. on R and data science

R-bloggers

R news and tutorials contributed by (600) R bloggers

Chris Oates

ACEMS Research Fellow, School of Mathematical and Physical Sciences, University of Technology Sydney

Another Astrostatistics Blog

The random musings of a reformed astronomer ...

Darren Wilkinson's research blog

Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics

CHANCE

Computational Bayesian statistics

StatsLife - Significance magazine

Computational Bayesian statistics

Computational Bayesian statistics

Igor's Blog

Computational Bayesian statistics

Statisfaction

I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science

Bayesian Research & Applications Group

Frontier Research in Bayesian Methodology & Computation