I’ve recently written my first R package using RcppArmadillo, but there are a few things about the performance of my code that puzzle me:
- I switched from using the Hadamard (element-wise) product to taking the log and then summing, a common statistical practice. This appeared to make my code 50% slower, so I’ve backed out the change.
- I compiled R from source using the Intel C, C++ & Fortran compilers on Linux and also replaced Rblas with MKL. The resulting code runs slower on my university’s SGI Altix cluster than the R compiled with gcc that was already available via PBS.
- Most of the compute nodes in the SGI cluster are dual-CPU, Xeon E5-2670 (8 cores @2.66GHz), therefore I can run up to 16 compute threads on a single node. However, 16 threads is actually slower than 6 threads.
The relationship one would expect to see is that the elapsed time halves when the number of cores doubles (excluding some parallel overhead) while the CPU time remains constant (plus overhead). However, according to these CPU usage figures, the addition of more threads adds nothing but overhead:
Yep, looks pretty linear…