Skip to content

Scalable nonparametric inference for random graphs

May 24, 2017

This paper by Emily Fox and François Caron has been on arXiv for a while, but a fortnight ago it was read at an ordinary meeting of the RSS. “Sparse graphs using exchangeable random measures” (J. R. Stat. Soc. Series B, 2017) enables simulation and Bayesian analysis of graphs with hundreds of thousands of nodes and over a million edges. This represents a major breakthrough for computationally tractable inference on substantial datasets. My thoughts on the paper and some preliminary experimental results are below.

The scalability of this approach is due to the representation of the discrete graph structure using a latent continuous model, the generalised gamma process (GGP). The sociability parameters w_i of each node can be estimated using Hamiltonian Monte Carlo (HMC), since the gradient of the conditional log-posterior is available in closed form:

\left[ \nabla_{\omega_{1:N_\alpha}} \log\{p(\omega_{1:N_\alpha}, w_* \mid D_\alpha )\} \right]_i = m_i - \sigma - w_i\left(\tau + 2\sum_{j=1}^{N_\alpha}w_j + 2w_* \right)

where \omega_i = \log\{w_i\}; N_\alpha is the number of nodes with degree \ge 1; w_* is the leftover sociability of the zero-degree nodes; D_\alpha is the degree matrix; m_i = \sum_{j=1}^{N_\alpha} n_{i, j} + n_{j,i} is the degree of node i; and \alpha, \sigma, \tau are the hyperparameters of the GGP.

Full details of the algorithm are provided in Appendix F of the paper and the supplementary material contains a reference implementation. Further improvements in scalability might be achieved by taking advantage of parallelism, as mentioned by the authors. The source code is written in pure MATLAB, without any use of .mex files (compiled C code). It only depends on the Stats toolbox for generating random variables. It would be interesting to see how Stan would perform on this model, or alternatively a C++ implementation using Rcpp. For an idea of what is possible, check out the R package BradleyTerryScalable by Ella Kaye and David Firth.

The pair potentials of the GGP are symmetric:

\Pr(z_{i, j} = 1 \mid w_i, w_j) = \Pr(n_{i, j} + n_{j,i} > 0 \mid w_i, w_j) = 1 - \exp\{-2 w_i w_j\}

where i \ne j. This implies a very different type of generative process than the adversarial graphs that are typical of Bradley-Terry or Plackett-Luce models, where nodes compete with each other for edges. Instead, the in-degree and out-degree are assumed to be independent Poisson counts.

To illustrate the difference between these two approaches, I computed w_i for each node using the GGP as well as the PageRank algorithm (Brin & Page, 1998). Famously, this is the original ranking algorithm that was used by Google. It is a spectral method, since the ranks correspond to the principal eigenvector of the degree matrix D_\alpha, where the nodes are websites and the edges are hyperlinks. Alternatively, PageRank can be derived from the stationary distribution of a Markov chain that describes random walks through the World-Wide Web.

www-adjacencyMx

Symmetrised adjacency matrix Z_\alpha for the WWW graph

PageRank has proven to be highly successful for datasets such as the WWW graph from the University of Notre Dame (Albert, Jeong & Barabási, 1999). This graph has 325,729 nodes, far larger than was previously feasible for computation with MCMC. The matrix D_\alpha has 1,497,134 directed edges, while the symmetric adjacency matrix Z_\alpha has 1,090,108 \times 2 nonzero entries. The GGP can be used for directed multigraphs, but this makes little difference due to the symmetric pair potentials shown above.

www-traceSigma

MCMC trace plot for \sigma, indicating 20k iterations discarded as burn-in

I ran the MATLAB code for 100k iterations with 2 parallel chains, which took 4 hours per chain on my iMac (3.5GHz Intel Core i7 with 16GB RAM). The chains appeared to converge after 20k iterations, as shown in the traceplot for one of the GGP parameters, \sigma. The 99% posterior credible interval was [0.368; 0.375], which differs significantly from the estimate in Table 2 of the paper (pg. 29). This might be due to mixing and convergence issues with a smaller number of iterations (40k, discarding 20k as burn-in). Memory usage climbed to over 30GB, which required a significant amount of swapping to disk. It would definitely be worthwhile running this code for longer on a Linux cluster with more available RAM, to see whether the MCMC has in fact converged as it appears from the traceplots. The estimated runtime of 132min that was reported in the paper seems to be quite optimistic for this dataset. This might also explain why the authors did not consider any graphs larger than this one.

www-degreeDist

95% posterior predictive interval for the degree distribution

PageRank can be computed using the centrality(..) function in MATLAB, or using the igraph package in R. The 3D histogram below shows that the relationship between \log(w_i) and log(PR) is highly nonlinear. Many nodes are assigned a PageRank less than 6e-6 (log(PR) < -12), whereas w_i is more spread out between 4.5e-5 and 2e-9.

wPageRank

log-log 3D histogram of PageRank against the sociability parameter w_i

It would be interesting to see whether the Bradley-Terry model (or equivalently Plackett-Luce with m=2) would show a similar relationship. A nonparametric Plackett-Luce model was proposed in an earlier paper (Caron, Teh & Murphy, 2014). Since this is similarly based on a marked Poisson process with intensity defined by a gamma process, it would also be interesting to compare the scalability between the two models. MATLAB code for nonparametric Plackett-Luce is also available from François Caron’s homepage.

Advertisements

From → MCMC, Network Data

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Richard Everitt's blog

Computational Bayesian statistics

Let's Look at the Figures

David Firth's blog

Nicholas Tierney

Computational Bayesian statistics

One weiRd tip

Computational Bayesian statistics

Series B'log

discussion blog for JRSS Series B papers

Mad (Data) Scientist

Musings, useful code etc. on R and data science

R-bloggers

R news and tutorials contributed by (750) R bloggers

Another Astrostatistics Blog

The random musings of a reformed astronomer ...

Darren Wilkinson's research blog

Statistics, computing, data science, Bayes, stochastic modelling, systems biology and bioinformatics

CHANCE

Computational Bayesian statistics

StatsLife - Significance magazine

Computational Bayesian statistics

(badness 10000)

Computational Bayesian statistics

Igor Kromin

Computational Bayesian statistics

Statisfaction

I can't get no

Xi'an's Og

an attempt at bloggin, nothing more...

Sam Clifford

Postdoctoral Fellow, Bayesian Statistics, Aerosol Science

Bayesian Research & Applications Group

Frontier Research in Bayesian Methodology & Computation

%d bloggers like this: