Fully Bayesian inference for generalized GP models with HMC

James Hensman, 2015, 2016, 2017

It’s possible to construct a very flexible models with Gaussian processes by combining them with different likelihoods (sometimes called ‘families’ in the GLM literature). This makes inference of the GP intractable since the likelihoods is not generally conjugate to the Gaussian process. The general form of the model is

\[\begin{split}\theta \sim p(\theta)\\f \sim \mathcal {GP}(m(x; \theta),\, k(x, x'; \theta))\\y_i \sim p(y | g(f(x_i))\,.\end{split}\]

To perform inference in this model, we’ll run MCMC using Hamiltonian Monte Carlo (HMC) over the function-values and the parameters \(\theta\) jointly. Key to an effective scheme is rotation of the field using the Cholesky decomposition. We write

\[\begin{split}\theta \sim p(\theta)\\v \sim \mathcal {N}(0,\, I)\\LL^\top = K\\f = m + Lv\\y_i \sim p(y | g(f(x_i))\,.\end{split}\]

Joint HMC over v and the function values is not widely adopted in the literature becate of the difficulty in differentiating \(LL^\top=K\). We’ve made this derivative available in tensorflow, and so application of HMC is relatively straightforward.

Exponential Regression example

The first illustration in this notebook is ‘Exponential Regression’. The model is

\[\begin{split}\theta \sim p(\theta)\\f \sim \mathcal {GP}(0, k(x, x'; \theta))\\f_i = f(x_i)\\y_i \sim \mathcal {Exp} (e^{f_i})\end{split}\]

We’ll use MCMC to deal with both the kernel parameters \(\theta\) and the latent function values \(f\). first, generate a data set.

In [2]:
import gpflow
import numpy as np
import matplotlib
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
plt = matplotlib.pyplot

X = np.linspace(-3,3,20)
Y = np.random.exponential(np.sin(X)**2)

GPflow’s model for fully-Bayesian MCMC is called GPMC. It’s constructed like any other model, but contains a parameter V which represents the centered values of the function.

In [3]:
with gpflow.defer_build():
    k = gpflow.kernels.Matern32(1, ARD=False) + gpflow.kernels.Bias(1)
    l = gpflow.likelihoods.Exponential()
    m = gpflow.models.GPMC(X[:,None], Y[:,None], k, l)

The V parameter already has a prior applied. We’ll add priors to the parameters also (these are rather arbitrary, for illustration).

In [4]:
m.kern.matern32.lengthscales.prior = gpflow.priors.Gamma(1., 1.)
m.kern.matern32.variance.prior = gpflow.priors.Gamma(1.,1.)
m.kern.bias.variance.prior = gpflow.priors.Gamma(1.,1.)

Running HMC is pretty similar to optimizing a model. GPflow only has HMC sampling for the moment, and it’s a relatively vanilla implementation (no NUTS, for example). There are two things to tune, the step size (epsilon) and the number of steps [Lmin, Lmax]. Each proposal will take a random number of steps between Lmin and Lmax, each of length epsilon.

We’ll use the verbose setting so that we can see the acceptance rate. <- this is broken :(

In [5]:
m.compile()
o = gpflow.train.AdamOptimizer(0.01)
o.minimize(m, maxiter=15) # start near MAP

s = gpflow.train.HMC()
samples = s.sample(m, 100, epsilon=0.12, lmax=20, lmin=5, thin=5, logprobs=False)#, verbose=True)
In [6]:
samples.head()
Out[6]:
GPMC/V GPMC/kern/bias/variance GPMC/kern/matern32/lengthscales GPMC/kern/matern32/variance
0 [[-0.144405700366], [-0.141889900428], [-0.140... 0.972360 0.914213 0.994590
1 [[-1.24350206927], [1.39075772477], [-1.547773... 0.647319 2.305985 1.363222
2 [[-1.4669149009], [0.310704499354], [1.4375941... 0.141541 0.830376 0.467676
3 [[-0.173718563696], [-1.00715986814], [1.48279... 0.125000 0.447166 0.473290
4 [[-1.36728182732], [0.204019170071], [1.762279... 1.461104 0.009536 0.774549
In [7]:
xtest = np.linspace(-4,4,100)[:,None]
f_samples = []
for i, s in samples.iterrows():
    m.assign(s)
    f_samples.append(m.predict_f_samples(xtest, 5, initialize=False))
f_samples = np.vstack(f_samples)
In [8]:
rate_samples = np.exp(f_samples[:, :, 0])

line, = plt.plot(xtest, np.mean(rate_samples, 0), lw=2)
plt.fill_between(xtest[:,0],
                 np.percentile(rate_samples, 5, axis=0),
                 np.percentile(rate_samples, 95, axis=0),
                 color=line.get_color(), alpha = 0.2)

plt.plot(X, Y, 'kx', mew=2)
plt.ylim(-0.1, np.max(np.percentile(rate_samples, 95, axis=0)))
Out[8]:
(-0.1, 3.1431644029407044)
../_images/notebooks_mcmc_11_1.png
In [9]:
samples.head()
Out[9]:
GPMC/V GPMC/kern/bias/variance GPMC/kern/matern32/lengthscales GPMC/kern/matern32/variance
0 [[-0.144405700366], [-0.141889900428], [-0.140... 0.972360 0.914213 0.994590
1 [[-1.24350206927], [1.39075772477], [-1.547773... 0.647319 2.305985 1.363222
2 [[-1.4669149009], [0.310704499354], [1.4375941... 0.141541 0.830376 0.467676
3 [[-0.173718563696], [-1.00715986814], [1.48279... 0.125000 0.447166 0.473290
4 [[-1.36728182732], [0.204019170071], [1.762279... 1.461104 0.009536 0.774549
In [10]:
samples['GPMC/kern/bias/variance'].hist(bins=20)
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x12893d358>
../_images/notebooks_mcmc_13_1.png