Skip to content

Numerical stability

April 19, 2013

Using log-probabilities can make an algorithm more numerically stable, particularly when that algorithm involves the product of several probabilities. Multiplying probabilities together can lead to floating point overflow, resulting in a value of 0 or Inf. The sum of the logs is more resilient to this problem, since it enables the representation of much larger (and much smaller) numbers. For example:

prod(1:1000)

## [1] Inf

sum(log(1:1000))

## [1] 5912

prod(1/(1:1000))

## [1] 0

sum(log(1/(1:1000)))

## [1] -5912

Converting the product into a sum of logs should also lead to a slight speedup, although as I’ve previously noted it actually makes my code slower. I’ve resurrected my log-probability code from my Git stash and run some experiments to determine what I gain, as well as measuring the computational cost. Details after the jump.

I don’t have the exact distribution of runtimes from the old code (without sums of logs), but suffice to say I was able to reliably perform 100K iterations of pseudolikelihood in less than 24 hours. This is important, because jobs that run in under 24 hours have their own queue on the supercomputer. Thus, the longer runtime is compounded by the fact that the jobs wait for longer in the queue.

This is what the distribution of runtimes looks like for 100K iterations of log-pseudolikelihood:

load("data/results_20130417.rda")  # result.pseudo from previous run
hist(as.double(result.pseudo$elapsed), main = "runtime of log-pseudolikelihood", 
    xlab = "elapsed time (hours)", breaks = 12, prob = T, xlim = c(12, 36))
lines(density(as.double(result.pseudo$elapsed)), col = "blue")

runtime of log-pseudolikelihood

6 of the PBS jobs exceeded the 24 hour time limit and had to be rerun using the long queue.
=>> PBS: job killed: walltime 86458 exceeded limit 86400
Those resubmitted jobs then spent various amounts of time waiting in the queue, from 1.44 up to 8.35 hours.

From → MCMC

One Comment
  1. This is also a handy trick, particularly for simulating from mixture distributions using the log-likelihood:
    http://jblevins.org/log/log-sum-exp

Leave a reply to Matt Cancel reply

ELLA KAYE

Computational Bayesian statistics

Bayes' Food Cake

A bit of statistics, a bit of cakes.

RWeekly.org - Blogs to Learn R from the Community

Computational Bayesian statistics

Richard Everitt's blog

Computational Bayesian statistics

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

Sweet Tea, Science

Two southern scientistas will be bringing you all that is awesome in STEM as we complete our PhDs. Ecology, statistics, sass.

Mad (Data) Scientist

Musings, useful code etc. on R and data science

Darren Wilkinson's blog

Statistics, computing, functional programming, data science, Bayes, stochastic modelling, systems biology and bioinformatics

(badness 10000)

Computational Bayesian statistics

Igor Kromin

Computational Bayesian statistics

Statisfaction

I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science