Skip to content

R with GotoBLAS on Windows 10

February 7, 2016

In my experience, the default libRblas gives horrible performance across all of the platforms that I regularly use (Windows, Mac OS & Linux). The R package that I’m currently developing uses RcppEigen, which is not dependent on an efficient BLAS or LAPACK library. However, many other R packages do have this dependency. Therefore, I would recommend following the instructions in the R Installation and Administration guide to switch over to a more efficient implementation. Avraham AdlerTony Fischetti and Zachary Mayer have written similar blog posts on this topic. I use the Accelerate Umbrella Framework (vecLib) on OS X and Intel MKL with icc on Linux. The following instructions describe how (and why) to install GotoBLAS for R on Microsoft Windows.


As described in pp. 191-192 of “Seamless R and C++ Integration with Rcpp” (DOI: 10.1007/978-1-4614-6868-4), the lmBenchmark script can be used as a rough performance measurement for dense matrix algebra on any system. You need to install the packages RcppEigen and rbenchmark, then run:

Rscript -e "source(system.file(\"examples\", \"lmBenchmark.R\", package = \"RcppEigen\"))"

The output should look something like this (with default Rblas.dll):

lm benchmark for n = 1650 and p = 875: nrep = 20
   user system  elapsed 
2021.83  21.27  2043.75 

   test relative elapsed user.self sys.self
3  LDLt    1.000    4.70      4.54     0.17
7  QR      1.374    6.46      6.25     0.20
8  LLt     1.436    6.75      6.44     0.32
1  lm.fit  3.853   18.11     18.03     0.03
6  SymmEig 5.700   26.79     26.57     0.22
2  PivQR   9.783   45.98     26.75    19.20
9  arma   19.155   90.03     89.72     0.29
4  GESDD  19.313   90.77     90.53     0.22
5  SVD   184.362  866.50    865.94     0.39
10 GSL   188.672  886.76    886.19     0.21

R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

These timings are for a 1650 × 875 matrix, rather than the default of 100,000 × 40. The results are highly dependent on the matrix dimensions, so you should use a size that is representative of the data that you are working with. The benchmark was run on a 2GHz Intel Core i7-4750HQ with Windows 10. You can compare these results to Table 12.2 on pg. 191 of Eddelbuettel (2013).

I installed OpenBLAS from SourceForge (the usual caveats for downloading binaries from SourceForge notwithstanding). Unfortunately, the usual method of replacing libRblas with libopenblas (or a softlink) is a one-way trip to DLL hell on Windows. If you start getting an error message that libgcc_s_seh-1.dll is missing, switch back to the original Rblas.dll (you did make a backup copy, right?). There are drop-in replacements using SurviveGotoBLAS 3.14 for some CPU architectures available here, but unfortunately not for the 4th generation Haswell with SSE 4.2 and AVX 2.0 instructions.

These are the results for the SurviveGotoBLAS binary (Nehalem architecture):

lm benchmark for n = 1650 and p = 875: nrep = 20
 user system elapsed 
2012.86 115.13 2072.31 

   test relative elapsed user.self sys.self
3  LDLt    1.000    4.71      4.45     0.25
7  QR      1.465    6.90      6.59     0.30
8  LLt     1.643    7.74      7.47     0.27
4  GESDD   4.662   21.96     38.75     5.33
6  SymmEig 6.270   29.53     29.24     0.28
9  arma    6.696   31.54     61.69    11.65
2  PivQR   9.688   45.63     26.45    19.10
1  lm.fit 25.607  120.61     36.31    77.37
5  SVD   190.735  898.36    897.09     0.43
10 GSL   191.970  904.18    903.67     0.13

RcppArmadillo (“arma”) improved from 90s elapsed time to 31.5, almost a 3× speedup. Likewise, GESDD improved from 90.8s to 22. However, lm.fit is slower at 120.6s elapsed. The 77s spent in sys.self is likely due to threading issues. Clearly, this is far from the desired outcome in switching BLAS implementations. There is a workaround for this issue, which is to install the R package RhpcBLASctl. This offers a function blas_set_num_threads(..) that you can use to force BLAS to be single-threaded.

Results for single-threaded SurviveGotoBLAS were as follows:

lm benchmark for n = 1650 and p = 875: nrep = 20
 user system elapsed 
1894.76 23.33 1919.85

   test relative elapsed user.self sys.self
3  LDLt    1.000    4.67      4.46     0.21
7  QR      1.355    6.33      6.08     0.25
8  LLt     1.422    6.64      6.43     0.22
1  lm.fit  1.623    7.58      7.50     0.08
6  SymmEig 5.728   26.75     26.39     0.33
9  arma    6.503   30.37     29.89     0.40
4  GESDD   7.017   32.77     32.45     0.32
2  PivQR  10.107   47.20     26.33    20.86
5  SVD   186.285  869.95    868.83     0.45
10 GSL   189.852  886.61    885.47     0.19

As described by Avraham Adler, the alternative is to install MSYS2 & MINGW64, then compile both OpenBLAS & R from source. Once again, Windows is the redheaded stepchild of platforms for running R.

Note that the Gnu scientific library (GSL) is also available for Windows. If you want to run lmBenchmark for RcppGSL, then you will also need to install it. However, GSL uses its own libgslcblas.dll, so it won’t benefit from installing OpenBLAS as described above. I don’t know why gsl_multifit_linear(..) is so slow, in comparison to all of the other implementations. I’ve observed similar RcppGSL performance when I ran lmBenchmark on Linux and OS X.

Advertisements

From → R, Toolchain

2 Comments
  1. I just updated those instructions for R3.3+ and Rtools34 so they are a bit different.

Trackbacks & Pingbacks

  1. R with MKL on Windows 10 | Matt Moores

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

One weiRd tip

Computational Bayesian statistics

Series B'log

discussion blog for JRSS Series B papers

Mad (Data) Scientist

Musings, useful code etc. on R and data science

R-bloggers

R news and tutorials contributed by (750) R bloggers

Another Astrostatistics Blog

The random musings of a reformed astronomer ...

Darren Wilkinson's research blog

Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics

CHANCE

Computational Bayesian statistics

StatsLife - Significance magazine

Computational Bayesian statistics

(badness 10000)

Computational Bayesian statistics

Igor's Blog

Computational Bayesian statistics

Statisfaction

I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science

Bayesian Research & Applications Group

Frontier Research in Bayesian Methodology & Computation

%d bloggers like this: