Handling models in GPflow

James Hensman November 2015, January 2016, Artem Artemev December 2017

One of the key ingredients in GPflow is the model class, which allows the user to carefully control parameters. This notebook shows how some of these parameter control features work, and how to build your own model with GPflow. First we’ll look at

  • How to view models and parameters
  • How to set parameter values
  • How to constrain parameters (e.g. variance > 0)
  • How to fix model parameters
  • How to apply priors to parameters
  • How to optimize models

Then we’ll show how to build a simple logistic regression model, demonstrating the ease of the parameter framework.

GPy users should feel right at home, but there are some small differences.

First, let’s deal with the usual notebook boilerplate and make a simple GP regression model. See the Regression notebook for specifics of the model: we just want some parameters to play with.

In [21]:
import gpflow
import numpy as np

Create a very simple GPR model without building it in TensorFlow graph.

In [22]:

with gpflow.defer_build():
    X = np.random.rand(20, 1)
    Y = np.sin(12 * X) + 0.66 * np.cos(25 * X) + np.random.randn(20,1) * 0.01
    m = gpflow.models.GPR(X, Y, kern=gpflow.kernels.Matern32(1) + gpflow.kernels.Linear(1))

Viewing, getting and setting parameters

You can display the state of the model in a terminal with print m (or print(m)), and by simply returning it in a notebook

In [23]:
m.as_pandas_table()
Out[23]:
class prior transform trainable shape fixed_shape value
GPR/kern/matern32/variance Parameter None +ve True () True 1.0
GPR/kern/matern32/lengthscales Parameter None +ve True () True 1.0
GPR/kern/linear/variance Parameter None +ve True () True 1.0
GPR/likelihood/variance Parameter None +ve True () True 1.0

This model has four parameters. The kernel is made of the sum of two parts: the RBF kernel has a variance parameter and a lengthscale parameter, the linear kernel only has a variance parameter. There is also a parameter controlling the variance of the noise, as part of the likelihood.

All of the model variables have been initialized at one. Individual parameters can be accessed in the same way as they are displayed in the table: to see all the parameters that are part of the likelihood, do

In [24]:
m.likelihood.as_pandas_table()
Out[24]:
class prior transform trainable shape fixed_shape value
GPR/likelihood/variance Parameter None +ve True () True 1.0

This gets more useful with more complex models!

To set the value of a parameter, just assign.

In [25]:
m.kern.matern32.lengthscales = 0.5
m.likelihood.variance = 0.01
m.as_pandas_table()
Out[25]:
class prior transform trainable shape fixed_shape value
GPR/kern/matern32/variance Parameter None +ve True () True 1.0
GPR/kern/matern32/lengthscales Parameter None +ve True () True 0.5
GPR/kern/linear/variance Parameter None +ve True () True 1.0
GPR/likelihood/variance Parameter None +ve True () True 0.01

Constraints and trainable variables

GPflow helpfully creates an unconstrained representation of all the variables. Above, all the variables are constrained positive (see right hand table column), the unconstrained representation is given by \(\alpha = \log(\exp(\theta)-1)\).

In [26]:
m.read_trainables()
Out[26]:
{'GPR/kern/linear/variance': array(1.0),
 'GPR/kern/matern32/lengthscales': array(0.5),
 'GPR/kern/matern32/variance': array(1.0),
 'GPR/likelihood/variance': array(0.01)}

Constraints are handled by the Transform classes. You might prefer the constrain \(\alpha = \log(\theta)\): this is easily done by changing setting the transform attribute on a parameter, with one simple condition - the model has not been compiled yet:

In [27]:
m.kern.matern32.lengthscales.transform = gpflow.transforms.Exp()
m.read_trainables()
Out[27]:
{'GPR/kern/linear/variance': array(1.0),
 'GPR/kern/matern32/lengthscales': array(0.5),
 'GPR/kern/matern32/variance': array(1.0),
 'GPR/likelihood/variance': array(0.01)}

The second free parameter, representing unconstrained lengthscale has changed (though the lengthscale itself remains the same). Another helpful feature is the ability to fix parameters. This is done by simply setting the fixed boolean to True: a ‘fixed’ notice appears in the representation and the corresponding variable is removed from the free state.

In [28]:
m.kern.linear.variance.trainable = False
m.as_pandas_table()
Out[28]:
class prior transform trainable shape fixed_shape value
GPR/kern/matern32/variance Parameter None +ve True () True 1.0
GPR/kern/matern32/lengthscales Parameter None Exp True () True 0.5
GPR/kern/linear/variance Parameter None +ve False () True 1.0
GPR/likelihood/variance Parameter None +ve True () True 0.01
In [29]:
m.read_trainables()
Out[29]:
{'GPR/kern/matern32/lengthscales': array(0.5),
 'GPR/kern/matern32/variance': array(1.0),
 'GPR/likelihood/variance': array(0.01)}

To unfix a parameter, just flip the boolean back. The transformation (+ve) reappears.

In [30]:
m.kern.linear.variance.trainable = True
m.as_pandas_table()
Out[30]:
class prior transform trainable shape fixed_shape value
GPR/kern/matern32/variance Parameter None +ve True () True 1.0
GPR/kern/matern32/lengthscales Parameter None Exp True () True 0.5
GPR/kern/linear/variance Parameter None +ve True () True 1.0
GPR/likelihood/variance Parameter None +ve True () True 0.01

Priors

Priors are set just like transforms and fixes, using members of the gpflow.priors. module. Let’s set a Gamma prior on the RBF-variance.

In [31]:
m.kern.matern32.variance.prior = gpflow.priors.Gamma(2,3)
m.as_pandas_table()
Out[31]:
class prior transform trainable shape fixed_shape value
GPR/kern/matern32/variance Parameter Ga([ 2.],[ 3.]) +ve True () True 1.0
GPR/kern/matern32/lengthscales Parameter None Exp True () True 0.5
GPR/kern/linear/variance Parameter None +ve True () True 1.0
GPR/likelihood/variance Parameter None +ve True () True 0.01

Optimization

Optimization is done by creating an instance of optimizer, in our case it is gpflow.train.ScipyOptimizer, which has optional arguments that are passed through to scipy.optimize.minimize (we minimize the negative log-likelihood) and calling minimize method of that optimizer with model as optimization target. Variables that have priors are MAP-estimated, others are ML, i.e. we add the log prior to the log likelihood.

In [32]:
m.compile()
opt = gpflow.train.ScipyOptimizer()
opt.minimize(m)
INFO:tensorflow:Optimization terminated with:
  Message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
  Objective function value: 2.634113
  Number of iterations: 34
  Number of functions evaluations: 39

Building new models

To build new models, you’ll need to inherrit from gpflow.models.Model. Parameters are instantiated with gpflow.Param. You may also be interested in gpflow.params.Parameterized which acts as a ‘container’ of Params (e.g. kernels are Parameterized).

In this very simple demo, we’ll implement linear multiclass classification. There will be two parameters: a weight matrix and a ‘bias’ (offset). The key thing to implement is the private _build_likelihood method, which should return a tensorflow scalar representing the (log) likelihood. Param objects can be used inside _build_likelihood: they will appear as appropriate (unconstrained) tensors.

In [33]:
import tensorflow as tf

class LinearMulticlass(gpflow.models.Model):
    def __init__(self, X, Y, name=None):
        super().__init__(name=name) # always call the parent constructor

        self.X = X.copy() # X is a numpy array of inputs
        self.Y = Y.copy() # Y is a 1-of-k representation of the labels

        self.num_data, self.input_dim = X.shape
        _, self.num_classes = Y.shape

        #make some parameters
        self.W = gpflow.Param(np.random.randn(self.input_dim, self.num_classes))
        self.b = gpflow.Param(np.random.randn(self.num_classes))

        # ^^ You must make the parameters attributes of the class for
        # them to be picked up by the model. i.e. this won't work:
        #
        # W = gpflow.Param(...    <-- must be self.W

    @gpflow.params_as_tensors
    def _build_likelihood(self): # takes no arguments
        p = tf.nn.softmax(tf.matmul(self.X, self.W) + self.b) # Param variables are used as tensorflow arrays.
        return tf.reduce_sum(tf.log(p) * self.Y) # be sure to return a scalar

…and that’s it. Let’s build a really simple demo to show that it works.

In [34]:
X = np.vstack([np.random.randn(10,2) + [2,2],
               np.random.randn(10,2) + [-2,2],
               np.random.randn(10,2) + [2,-2]])
Y = np.repeat(np.eye(3), 10, 0)

from matplotlib import pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12,6)
plt.scatter(X[:,0], X[:,1], 100, np.argmax(Y, 1), lw=2, cmap=plt.cm.viridis)
Out[34]:
<matplotlib.collections.PathCollection at 0x120f67a58>
../_images/notebooks_models_27_1.png
In [35]:
m = LinearMulticlass(X, Y)
m.as_pandas_table()
Out[35]:
class prior transform trainable shape fixed_shape value
LinearMulticlass/W Parameter None (none) True (2, 3) True [[-1.86126389328, 0.274590690509, -0.749392536...
LinearMulticlass/b Parameter None (none) True (3,) True [0.898180439959, -0.0512106496384, -0.19183247...
In [36]:
opt = gpflow.train.ScipyOptimizer()
opt.minimize(m)
m.as_pandas_table()
INFO:tensorflow:Optimization terminated with:
  Message: b'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
  Objective function value: 0.000011
  Number of iterations: 25
  Number of functions evaluations: 26
Out[36]:
class prior transform trainable shape fixed_shape value
LinearMulticlass/W Parameter None (none) True (2, 3) True [[8.19009259164, -17.4730464032, 6.94688807267...
LinearMulticlass/b Parameter None (none) True (3,) True [-5.00699554863, 7.85155019966, -2.18941734065]
In [37]:
xx, yy = np.mgrid[-4:4:200j, -4:4:200j]
X_test = np.vstack([xx.flatten(), yy.flatten()]).T
f_test = np.dot(X_test, m.W.read_value()) + m.b.read_value()
p_test = np.exp(f_test)
p_test /= p_test.sum(1)[:,None]
In [38]:
for i in range(3):
    plt.contour(xx, yy, p_test[:,i].reshape(200,200), [0.5], colors='k', linewidths=1)
plt.scatter(X[:,0], X[:,1], 100, np.argmax(Y, 1), lw=2, cmap=plt.cm.viridis)
Out[38]:
<matplotlib.collections.PathCollection at 0x10a330fd0>
../_images/notebooks_models_31_1.png

That concludes the new model example and this notebook. You might want to convince yourself that the LinearMulticlass model and its parameters have all the functionality demonstrated above. You could also add some priors and run Hamiltonian Monte Carlo using HMC optimizer gpflow.train.HMC and sample method. See the sparse_MCMC notebook for details of running the sampler.