# Fitting nonlinear functions in Stan

Following up on a previous post, where I showed that the R function nls() was giving biased estimates in the presence of heteroskedastic, truncated noise. The **nlme** package provides the function gnls() for generalised least squares, but this seemed to involve defining a custom varFunc class to reweight the observations. For more detail on this option, refer to ch. 5 of Pinheiro & Bates (2000). Instead, I show how I formulated the likelihood in the Stan modelling language and estimated the parameter using Hamiltonian Monte Carlo (HMC). Thanks very much to Bob Carpenter for his help in getting this code to work.

To briefly recap, the mean function is characterised by the following differential equation:

which is a logistic curve with rate parameter and a horizontal asymptote at . The solution for an initial value is:

The observations are generated from a heteroskedastic, truncated normal distribution where the variance is equal to the gradient of the mean:

Simulating data from this model is straightforward, for example using the R package **rtruncnorm**:

This is what it looks like in the Stan modelling language (a derivative of BUGS):

functions { vector ft(vector t, real r, real y0, real tC, real L) { vector[num_elements(t)] mu; vector[num_elements(t)] exprt; exprt = exp(r*(t-tC)); mu = y0*L*exprt ./ (L + y0*(exprt - 1)); return mu; } vector dfdt(vector t, real r, real tC, real L) { vector[num_elements(t)] dmu; vector[num_elements(t)] sddenom; for (i in 1:num_elements(t)) { sddenom[i] = ((exp(r*tC) + exp(r*t[i]))^(-2)); } dmu = r*L*exp(r*(t+tC)) .* sddenom; return dmu; } } data { int<lower = 1> M; int<lower = 1> N; real<lower = 0> mu0; real<lower = 1> maxY; real tcrit; matrix<lower=0, upper=maxY>[M,N] y; vector[M] t; } parameters { real<lower = 0> r; } transformed parameters { vector[M] curr_mu; vector[M] curr_sd; curr_mu = ft(t, r, mu0, tcrit, maxY); curr_sd = dfdt(t, r, tcrit, maxY); } model { for (i in 1:M) { for (j in 1:N) { y[i,j] ~ normal(curr_mu[i], curr_sd[i]) T[0,maxY]; } } }

When we try to fit this model, Stan gives the following error:

[1] "The following numerical problems occured the indicated number of times after warmup on chain 1" count Exception thrown at line 28: normal_log: Scale parameter is 0, but must be > 0! 35

This indicates that there are problems with numerical precision in the tails of the distribution, as . To avoid this, I added an lower bound on the standard deviation:

curr_sd = dfdt(t, r, tcrit, maxY) + eps;

Even after fixing this problem, I still had the dreaded *divergent transitions*:

Warning messages: 1: There were 2155 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

I tried increasing adapt_delta as suggested, but I still got divergent transitions even at adapt_delta=0.99. Even adding a prior on r didn’t really help. In the end, I dropped the truncated normal distribution:

model { for (i in 1:M) { y[i,] ~ normal(curr_mu[i], curr_sd[i]); } }

Even though this meant that the model was slightly misspecified, the true value of r was now well within the region of highest posterior density:

Inference for Stan model: GrowthCurve4. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat r 1.98 0 0.02 1.95 1.97 1.98 2 2.02 3079 1 Samples were drawn using NUTS(diag_e) at Wed Apr 19 15:17:27 2017. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).

I can modify the model so that both r and mu0 are treated as unknown parameters:

parameters { real<lower = 0> r; real<lower = 0> mu0; }

In this case, with r=2.5 and mu0=4, the true values are within the region of posterior support:

Inference for Stan model: GrowthCurve4. 4 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=4000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat r 2.43 0.00 0.04 2.35 2.41 2.43 2.45 2.50 1761 1 mu0 3.80 0.01 0.30 3.24 3.59 3.79 3.99 4.42 1702 1 Samples were drawn using NUTS(diag_e) at Wed Apr 19 16:00:23 2017.

Does this imply a flat prior on r such that it’s non-negative? nls() parameterises in terms of log(r), so I’d normally be using a gamma on r or a normal on log(r).

The code above just uses the default Stan priors (which I’d assume are half-Cauchy?) but you could specify priors explicitly, just like you would do in JAGS or any other BUGS language. For example, see the Stan code on GitHub: https://gist.github.com/mooresm/62765b7729ed8a9f63211d58a6eea5d5

Note that I had to increase adapt_delta due to divergent transitions when I added the gamma prior for r, but adapt_delta=0.9 fixed the problem.

The lower=0 automatically transforms it to log scale. See manual.