Sanity Check: when model behaviours should overlap

James Hensman and Alexander G. de G. Matthews, 2016

Many of the model classes in GPflow have overlapping behaviour in special cases. In this section, we fit some approximations to a model with a Gaussian likelihood, and make sure they’re all the same.

The models are: - GPR full Gaussian process regression

  • VGP a Gaussian approximation with Variational Bayes (approximating a Gaussian posterior with a Gaussian should be exact)
  • SVGP a sparse GP, with a Gaussian approximation. The inducing points are set to be at the data points, so again, should be exact.
  • SVGP (with whitened representation) As above, but with a rotation applied to whiten the representation of the process
  • SGPR A sparse GP with a collapsed posterior (Titsias 2009). Again, the inducing points are fixed to the data points.
  • GPRFITC The FITC approximation, again, the inducing points are fixed to the data points.

In all cases the parameters are estimated by the method using maximum likelihood (or approximate maximum likelihood, as appropriate). The parameter estimates should all be the same.

In [1]:
import gpflow
import numpy as np
import matplotlib
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
plt = matplotlib.pyplot
In [2]:
np.random.seed(0)
X = np.random.rand(20,1)*10
Y = np.sin(X) + 0.9 * np.cos(X*1.6) + np.random.randn(*X.shape)* 0.4
Xtest = np.random.rand(10,1)*10
plt.plot(X, Y, 'kx', mew=2)
Out[2]:
[<matplotlib.lines.Line2D at 0x1089a7240>]
../_images/notebooks_Sanity_check_2_1.png
In [3]:
m1 = gpflow.models.GPR(X, Y, kern=gpflow.kernels.RBF(1))
m2 = gpflow.models.VGP(X, Y, gpflow.kernels.RBF(1), likelihood=gpflow.likelihoods.Gaussian())
m3 = gpflow.models.SVGP(X, Y, gpflow.kernels.RBF(1),
                      likelihood=gpflow.likelihoods.Gaussian(),
                      Z=X.copy(), q_diag=False)
m3.feature.set_trainable(False)
m4 = gpflow.models.SVGP(X, Y, gpflow.kernels.RBF(1),
                      likelihood=gpflow.likelihoods.Gaussian(),
                      Z=X.copy(), q_diag=False, whiten=True)
m4.feature.set_trainable(False)
m5 = gpflow.models.SGPR(X, Y, gpflow.kernels.RBF(1), Z=X.copy())
m5.feature.set_trainable(False)
m6 = gpflow.models.GPRFITC(X, Y, gpflow.kernels.RBF(1), Z=X.copy())
m6.feature.set_trainable(False)
models = [m1, m2, m3, m4, m5, m6]

Now optimize the models. For GPR, SVGP and GPFITC this simply optimizes the hyper-parameters (since the inducing points are fixed). For the variational models, this jointly maximises the ELBO with respect to the variational parameters and the kernel parameters.

In [4]:
o = gpflow.train.ScipyOptimizer(method='BFGS')
_ = [o.minimize(m, maxiter=100) for m in models]
INFO:tensorflow:Optimization terminated with:
  Message: Optimization terminated successfully.
  Objective function value: 18.834361
  Number of iterations: 8
  Number of functions evaluations: 12
INFO:tensorflow:Optimization terminated with:
  Message: Optimization terminated successfully.
  Objective function value: 18.834361
  Number of iterations: 54
  Number of functions evaluations: 70
INFO:tensorflow:Optimization terminated with:
  Message: Optimization terminated successfully.
  Objective function value: 18.834400
  Number of iterations: 54
  Number of functions evaluations: 73
INFO:tensorflow:Optimization terminated with:
  Message: Optimization terminated successfully.
  Objective function value: 18.834400
  Number of iterations: 54
  Number of functions evaluations: 73
INFO:tensorflow:Optimization terminated with:
  Message: Optimization terminated successfully.
  Objective function value: 18.834400
  Number of iterations: 8
  Number of functions evaluations: 12
INFO:tensorflow:Optimization terminated with:
  Message: Optimization terminated successfully.
  Objective function value: 18.834355
  Number of iterations: 8
  Number of functions evaluations: 12

If everything worked as planned, the models should have the same

  • prediction functions
  • log (marginal) likelihood
  • kernel parameters

For the variational models, where we use a ELBO in place of the likelihood, the ELBO should be tight to the likelihood in the cases studied here.

In [8]:
def plot(m, color, ax):
    xx = np.linspace(-1, 11, 100)[:,None]
    mu, var = m.predict_y(xx)
    ax.plot(xx, mu, color, lw=2)
    ax.fill_between(xx[:,0], mu[:,0] -  2*np.sqrt(var[:,0]), mu[:,0] +  2*np.sqrt(var[:,0]), color=color, alpha=0.2)
    ax.plot(X, Y, 'kx', mew=2)
    ax.set_xlim(-1, 11)

f, ax = plt.subplots(3,2,sharex=True, sharey=True, figsize=(12,9))
plot(m1, 'C0', ax[0,0])
plot(m2, 'C1', ax[1,0])
plot(m3, 'C2', ax[0,1])
plot(m4, 'C3', ax[1,1])
plot(m5, 'C4', ax[2,0])
plot(m6, 'C5', ax[2,1])
../_images/notebooks_Sanity_check_7_0.png

Here are the kernels and likelihoods, which show the fitted kernel parameters and noise variance:

In [18]:
from IPython import display
#_ = [display.display(m.kern, m.likelihood) for m in models]
[print(m.kern, '\n\n', m.likelihood, '\n\n----\n') for m in models];
<Parameter name:GPR/kern/lengthscales [trainable] shape:() transform:+ve prior:None>
value: 1.0774376245327446

<Parameter name:GPR/kern/variance [trainable] shape:() transform:+ve prior:None>
value: 0.8256105491329836

 <Parameter name:GPR/likelihood/variance [trainable] shape:() transform:+ve prior:None>
value: 0.16002167259084216

----

<Parameter name:VGP/kern/lengthscales [trainable] shape:() transform:+ve prior:None>
value: 1.0774376485988129

<Parameter name:VGP/kern/variance [trainable] shape:() transform:+ve prior:None>
value: 0.82561040142936726

 <Parameter name:VGP/likelihood/variance [trainable] shape:() transform:+ve prior:None>
value: 0.16002069250966058

----

<Parameter name:SVGP/kern/lengthscales [trainable] shape:() transform:+ve prior:None>
value: 1.0774407145919001

<Parameter name:SVGP/kern/variance [trainable] shape:() transform:+ve prior:None>
value: 0.82561225599881838

 <Parameter name:SVGP/likelihood/variance [trainable] shape:() transform:+ve prior:None>
value: 0.16002304036809423

----

<Parameter name:SVGP/kern/lengthscales [trainable] shape:() transform:+ve prior:None>
value: 1.0774407146160694

<Parameter name:SVGP/kern/variance [trainable] shape:() transform:+ve prior:None>
value: 0.82561225648110581

 <Parameter name:SVGP/likelihood/variance [trainable] shape:() transform:+ve prior:None>
value: 0.16002304016499685

----

<Parameter name:SGPR/kern/lengthscales [trainable] shape:() transform:+ve prior:None>
value: 1.0774406951137334

<Parameter name:SGPR/kern/variance [trainable] shape:() transform:+ve prior:None>
value: 0.82561274879830437

 <Parameter name:SGPR/likelihood/variance [trainable] shape:() transform:+ve prior:None>
value: 0.16002302477112654

----

<Parameter name:GPRFITC/kern/lengthscales [trainable] shape:() transform:+ve prior:None>
value: 1.0774388272951685

<Parameter name:GPRFITC/kern/variance [trainable] shape:() transform:+ve prior:None>
value: 0.82561176748687659

 <Parameter name:GPRFITC/likelihood/variance [trainable] shape:() transform:+ve prior:None>
value: 0.16002122963998355

----

Here are the likelihoods (or ELBOs)

In [15]:
print(np.array([m.compute_log_likelihood() for m in models]))
[-18.83436136 -18.83436136 -18.83439999 -18.83439999 -18.83439999
 -18.83435478]