Skip to content

Fitting nonlinear functions in Stan

June 12, 2017

Following up on a previous post, where I showed that the R function nls() was giving biased estimates in the presence of heteroskedastic, truncated noise. The nlme package provides the function gnls() for generalised least squares, but this seemed to involve defining a custom varFunc class to reweight the observations. For more detail on this option, refer to ch. 5 of Pinheiro & Bates (2000). Instead, I show how I formulated the likelihood in the Stan modelling language and estimated the parameter using Hamiltonian Monte Carlo (HMC). Thanks very much to Bob Carpenter for his help in getting this code to work.

To briefly recap, the mean function is characterised by the following differential equation:

\frac{d\mu}{dt} = r\mu \left( 1 - \frac{\mu}{L} \right)

which is a logistic curve with rate parameter r and a horizontal asymptote at L. The solution for an initial value \mu_0 is:

f(t) = \frac{\mu_0 L e^{rt}}{L + \mu_0 (e^{rt} - 1)}

The observations are generated from a heteroskedastic, truncated normal distribution where the variance is equal to the gradient of the mean:

Y_t \sim \mathcal{N}\left( f(t), \frac{\mathrm{d}f(t)}{\mathrm{d}t} \right)\mathbb{I}_{(0 \le Y_t \le L)}

Simulating data from this model is straightforward, for example using the R package rtruncnorm:


This is what it looks like in the Stan modelling language (a derivative of BUGS):

functions {
  vector ft(vector t, real r, real y0, real tC, real L) {
    vector[num_elements(t)] mu;
    vector[num_elements(t)] exprt;
    exprt = exp(r*(t-tC));
    mu = y0*L*exprt ./ (L + y0*(exprt - 1));
    return mu;
  vector dfdt(vector t, real r, real tC, real L) {
    vector[num_elements(t)] dmu;
    vector[num_elements(t)] sddenom;
    for (i in 1:num_elements(t)) {
      sddenom[i] = ((exp(r*tC) + exp(r*t[i]))^(-2));
    dmu = r*L*exp(r*(t+tC)) .* sddenom;
    return dmu;
data {
  int<lower = 1> M;
  int<lower = 1> N;
  real<lower = 0> mu0;
  real<lower = 1> maxY;
  real tcrit;
  matrix<lower=0, upper=maxY>[M,N] y;
  vector[M] t;
parameters {
  real<lower = 0> r;
transformed parameters {
  vector[M] curr_mu;
  vector[M] curr_sd;
  curr_mu = ft(t, r, mu0, tcrit, maxY);
  curr_sd = dfdt(t, r, tcrit, maxY);
model {
  for (i in 1:M) {
    for (j in 1:N) {
      y[i,j] ~ normal(curr_mu[i], curr_sd[i]) T[0,maxY];

When we try to fit this model, Stan gives the following error:

[1] "The following numerical problems occured the indicated number of times after warmup on chain 1"
Exception thrown at line 28: normal_log: Scale parameter is 0, but must be > 0! 35

This indicates that there are problems with numerical precision in the tails of the distribution, as \sigma \rightarrow 0. To avoid this, I added an \epsilon lower bound on the standard deviation:

   curr_sd = dfdt(t, r, tcrit, maxY) + eps;

Even after fixing this problem, I still had the dreaded divergent transitions:

Warning messages:
1: There were 2155 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See 

I tried increasing adapt_delta as suggested, but I still got divergent transitions even at adapt_delta=0.99. Even adding a prior on r didn’t really help. In the end, I dropped the truncated normal distribution:

model {
  for (i in 1:M) {
    y[i,] ~ normal(curr_mu[i], curr_sd[i]);

Even though this meant that the model was slightly misspecified, the true value of r was now well within the region of highest posterior density:

Inference for Stan model: GrowthCurve4.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

 mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
r 1.98 0 0.02 1.95 1.97 1.98 2 2.02 3079 1

Samples were drawn using NUTS(diag_e) at Wed Apr 19 15:17:27 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).


 I can modify the model so that both r and mu0 are treated as unknown parameters:

parameters {
 real<lower = 0> r;
 real<lower = 0> mu0;

In this case, with r=2.5 and mu0=4, the true values are within the region of posterior support:

Inference for Stan model: GrowthCurve4.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

 mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
r 2.43 0.00 0.04 2.35 2.41 2.43 2.45 2.50 1761 1
mu0 3.80 0.01 0.30 3.24 3.59 3.79 3.99 4.42 1702 1

Samples were drawn using NUTS(diag_e) at Wed Apr 19 16:00:23 2017.



From → Functional Data, R

  1. Does this imply a flat prior on r such that it’s non-negative? nls() parameterises in terms of log(r), so I’d normally be using a gamma on r or a normal on log(r).

    • The code above just uses the default Stan priors (which I’d assume are half-Cauchy?) but you could specify priors explicitly, just like you would do in JAGS or any other BUGS language. For example, see the Stan code on GitHub:

      Note that I had to increase adapt_delta due to divergent transitions when I added the gamma prior for r, but adapt_delta=0.9 fixed the problem.

    • Dan Simpson permalink

      The lower=0 automatically transforms it to log scale. See manual.

Trackbacks & Pingbacks

  1. Parametric Functional Approximate Bayesian (PFAB) algorithm | Matt Moores

Leave a Reply to Dan Simpson Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Ella Kaye on Ella Kaye

Computational Bayesian statistics

Bayes' Food Cake

A bit of statistics, a bit of cakes. - Blogs to Learn R from the Community

Computational Bayesian statistics

Richard Everitt's blog

Computational Bayesian statistics

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

Sweet Tea, Science

Two southern scientistas will be bringing you all that is awesome in STEM as we complete our PhDs. Ecology, statistics, sass.

Mad (Data) Scientist

Musings, useful code etc. on R and data science

Darren Wilkinson's blog

Statistics, computing, functional programming, data science, Bayes, stochastic modelling, systems biology and bioinformatics

(badness 10000)

Computational Bayesian statistics

Igor Kromin

Computational Bayesian statistics


I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science

%d bloggers like this: