How GPflow relates to TensorFlow: tips & tricks

GPflow is built on top of TensorFlow, so it is useful to have some understanding of how TensorFlow works. In particular, TensorFlow’s two-stage concept of first building a static compute graph and then executing it for specific input values can cause problems. This notebook aims to help with the most common issues.

[1]:
import gpflow
import tensorflow as tf
import numpy as np

from gpflow import settings

1. Computation time increases when I create GPflow objects

The following example shows a typical situation when computation time increases proportionally to the number of GPflow objects that are created.

[2]:
for n in range(2, 4):
    kernel = gpflow.kernels.RBF(input_dim=1)  # This is a gpflow object with tf.Variables inside
    x = np.random.randn(n, 1)  # gpflow expects rank-2 input matrices, even for D=1
    kxx = kernel.K(x)  # This is a tensor!

Remember, we operate on a TensorFlow graph!

Every time we create (build and compile) a new GPflow object, we continue to add more tensors to the graph and change only the reference to them, despite overwriting (in this case) the kernel variable.

So, unnecessary expansion of the graph slows down your computation!

The following examples show how to fix the issue (imagine running this code snippet in ipython repeatedly):

[3]:
for n in range(2, 4):
    gpflow.reset_default_graph_and_session()
    kernel = gpflow.kernels.RBF(1)
    x = np.random.randn(n, 1)
    kxx = kernel.K(x)

Here we were simply resetting the default graph and session using GPflow’s reset_default_graph_and_session() function. In the next example we explicitly build new tf.Graph() and tf.Session() objects:

[4]:
for n in range(2, 4):
    with tf.Graph().as_default() as graph:
        with tf.Session(graph=graph).as_default():
            kernel = gpflow.kernels.RBF(1)
            x = np.random.randn(n, 1)
            kxx = kernel.K(x)

In the Custom mean functions notebook we show a real-world example of this idea.

2. I want to reuse a model on different data

[5]:
np.random.seed(1)
x = np.random.randn(2, 1)
y = np.random.randn(2, 1)
kernel = gpflow.kernels.RBF(1)
model = gpflow.models.GPR(x, y, kernel)
print(model.compute_log_likelihood())

x_new = np.random.randn(100, 1)
y_new = np.random.randn(100, 1)
-2.8766930392437593

We can compute the log-likelihood of the model on different data. Note that we didn’t change the original model!

[6]:
x_tensor = model.X.parameter_tensor
y_tensor = model.Y.parameter_tensor
model.compute_log_likelihood(feed_dict={x_tensor: x_new, y_tensor: y_new})  # we can still probe the model with the old data
[6]:
-140.83017563045192

We can do the same by permanently updating the values of the dataholders.

[7]:
model.X = x_new
model.Y = y_new
model.compute_log_likelihood()
[7]:
-140.83017563045192

3. I want to use external TensorFlow tensors and pass them to a GPflow model

You can pass TensorFlow tensors for any non-trainable parameters of the GPflow objects like DataHolders.

[8]:
np.random.seed(1)
kernel = gpflow.kernels.RBF(1)
likelihood = gpflow.likelihoods.Gaussian()

x_tensor = tf.random_normal((100, 1), dtype=settings.float_type)
y_tensor = tf.random_normal((100, 1), dtype=settings.float_type)
z = np.random.randn(10, 1)

model = gpflow.models.SVGP(x_tensor, y_tensor, kern=kernel, likelihood=likelihood, Z=z)
model.compute_log_likelihood()
[8]:
-196.46001677717464

You can also use TensorFlow variables for trainable objects:

[9]:
z = tf.Variable(np.random.randn(10, 1))
model = gpflow.models.SVGP(x_tensor, y_tensor, kern=kernel, likelihood=likelihood, Z=z)

However, in this case you have to initialise them manually, before interacting with a model:

[10]:
session = gpflow.get_default_session()
session.run(z.initializer)
model.compute_log_likelihood()
[10]:
-193.12198293763578

4. I want to share parameters between GPflow objects

Sometimes we want to impose a hard-coded structure on the model (for example, if we have a multi-output model where some output dimensions share the same kernel and others don’t). Unfortunately we cannot do this after the kernel object is compiled. We have to do it at build time and then manually compile the object.

[11]:
with gpflow.decors.defer_build():
    kernels = [gpflow.kernels.RBF(1) for _ in range(3)]
    mo_kernels = gpflow.multioutput.kernels.SeparateMixedMok(kernels, W=np.random.randn(3, 4))
    mo_kernels.kernels[0].lengthscales = mo_kernels.kernels[1].lengthscales
    mo_kernels.compile()

assert mo_kernels.kernels[0].lengthscales is mo_kernels.kernels[1].lengthscales

5. Optimising my model repeatedly slows down the computation time

The following is an example of bad practice:

[12]:
x = np.random.randn(100, 1)
y = np.random.randn(100, 1)
model = gpflow.models.GPR(x, y, kernel)

optimizer = gpflow.training.AdamOptimizer()

optimizer.minimize(model, maxiter=2)

# Do something with the model

optimizer.minimize(model, maxiter=2)

The minimize() call creates a bunch of optimisation tensors. Calling minimize() again causes the same issue discussed under issue (1).

The correct way of optimising your model without polluting your graph is as follows:

[13]:
kernel = gpflow.kernels.RBF(1)
x = np.random.randn(100, 1)
y = np.random.randn(100, 1)
model = gpflow.models.GPR(x, y, kernel)

optimizer = gpflow.training.AdamOptimizer()
optimizer_tensor = optimizer.make_optimize_tensor(model)
session = gpflow.get_default_session()
for _ in range(2):
    session.run(optimizer_tensor)

Don’t forget to anchor your model to the session after optimisation. Then you can continue working with your model.

[14]:
model.anchor(session)

Now, if you need to optimise it again, you can reuse the same optimiser tensor.

[15]:
for _ in range(2):
    session.run(optimizer_tensor)

model.anchor(session)

6. When I try to read parameter values, I’m getting stale values

[16]:
np.random.seed(1)
x = np.random.randn(100, 1)
y = np.random.randn(100, 1)

kernel = gpflow.kernels.RBF(1)
model = gpflow.models.GPR(x, y, kernel)
optimizer = gpflow.training.AdamOptimizer()
optimizer_tensor = optimizer.make_optimize_tensor(model)

The initial value before optimisation is:

[17]:
model.kern.lengthscales.value
[17]:
array(1.)

Let’s call one step of the optimisation and check the new value of the parameter.

[18]:
gpflow.get_default_session().run(optimizer_tensor)
model.kern.lengthscales.value
[18]:
array(1.)

After optimisation you would expect that the parameters were updated, but they weren’t. The trick is that the value property returns a cached NumPy value of a parameter.

You can get the value of the optimised parameter by using the read_value() method, specifying the correct session.

[19]:
model.kern.lengthscales.read_value(session)
[19]:
1.0006322362558255

Alternatively, you can anchor(session) your model to the session after the optimisation step. The anchor() updates the parameters’ cache.

NOTE: The anchor(session) method is significantly more time-consuming than read_value(session). Do not call it too often unless you need to.

[20]:
model.anchor(session)
model.kern.lengthscales.value
[20]:
array(1.00063224)

7. I want to save and load a GPflow model

[21]:
kernel = gpflow.kernels.RBF(1)
x = np.random.randn(100, 1)
y = np.random.randn(100, 1)
model = gpflow.models.GPR(x, y, kernel)

from pathlib import Path
filename = "/tmp/gpr.gpflow"
path = Path(filename)
if path.exists():
    path.unlink()
saver = gpflow.saver.Saver()
saver.save(filename, model)

You can load the model into a different graph:

[22]:
with tf.Graph().as_default() as graph, tf.Session().as_default():
    model_copy = saver.load(filename)

Alternatively, you can load the model into the same session:

[23]:
ctx_for_loading = gpflow.saver.SaverContext(autocompile=False)
model_copy = saver.load(filename, context=ctx_for_loading)
model_copy.clear()
model_copy.compile()

The difference between the former approach and the latter lies in the TensorFlow name scopes which are used for naming variables. The former approach replicates the instance of the TensorFlow objects (which already exist in the original graph), so we need to load the model into a new graph. The latter approach uses different name scopes for the variables so that you can dump the model in the same graph.