Skip to content

Numerical stability

April 19, 2013

Using log-probabilities can make an algorithm more numerically stable, particularly when that algorithm involves the product of several probabilities. Multiplying probabilities together can lead to floating point overflow, resulting in a value of 0 or Inf. The sum of the logs is more resilient to this problem, since it enables the representation of much larger (and much smaller) numbers. For example:

prod(1:1000)

## [1] Inf

sum(log(1:1000))

## [1] 5912

prod(1/(1:1000))

## [1] 0

sum(log(1/(1:1000)))

## [1] -5912

Converting the product into a sum of logs should also lead to a slight speedup, although as I’ve previously noted it actually makes my code slower. I’ve resurrected my log-probability code from my Git stash and run some experiments to determine what I gain, as well as measuring the computational cost. Details after the jump.

I don’t have the exact distribution of runtimes from the old code (without sums of logs), but suffice to say I was able to reliably perform 100K iterations of pseudolikelihood in less than 24 hours. This is important, because jobs that run in under 24 hours have their own queue on the supercomputer. Thus, the longer runtime is compounded by the fact that the jobs wait for longer in the queue.

This is what the distribution of runtimes looks like for 100K iterations of log-pseudolikelihood:

load("data/results_20130417.rda")  # result.pseudo from previous run
hist(as.double(result.pseudo$elapsed), main = "runtime of log-pseudolikelihood", 
    xlab = "elapsed time (hours)", breaks = 12, prob = T, xlim = c(12, 36))
lines(density(as.double(result.pseudo$elapsed)), col = "blue")

runtime of log-pseudolikelihood

6 of the PBS jobs exceeded the 24 hour time limit and had to be rerun using the long queue.
=>> PBS: job killed: walltime 86458 exceeded limit 86400
Those resubmitted jobs then spent various amounts of time waiting in the queue, from 1.44 up to 8.35 hours.

Advertisements

From → MCMC

One Comment
  1. This is also a handy trick, particularly for simulating from mixture distributions using the log-likelihood:
    http://jblevins.org/log/log-sum-exp

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

One weiRd tip

Computational Bayesian statistics

Series B'log

discussion blog for JRSS Series B papers

Mad (Data) Scientist

Musings, useful code etc. on R and data science

R-bloggers

R news and tutorials contributed by (750) R bloggers

Another Astrostatistics Blog

The random musings of a reformed astronomer ...

Darren Wilkinson's research blog

Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics

CHANCE

Computational Bayesian statistics

StatsLife - Significance magazine

Computational Bayesian statistics

(badness 10000)

Computational Bayesian statistics

Igor's Blog

Computational Bayesian statistics

Statisfaction

I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science

Bayesian Research & Applications Group

Frontier Research in Bayesian Methodology & Computation

%d bloggers like this: