GPflow with TensorFlow 2

Small steps big changes

from typing import Tuple, Optional
import tempfile
import pathlib

import datetime
import io
import matplotlib.pyplot as plt

import numpy as np
import tensorflow as tf
import gpflow

from gpflow.config import default_float
from gpflow.ci_utils import ci_niter
from gpflow.utilities import to_default_float

import warnings


Make tensorboard work inside notebook:

output_logdir = "/tmp/tensorboard"

!rm -rf "{output_logdir}"
!mkdir "{output_logdir}"

%load_ext tensorboard
%matplotlib inline

def enumerated_logdir(_logdir_id: int = [0]):
    logdir = pathlib.Path(output_logdir, str(_logdir_id[0]))
    _logdir_id[0] += 1
    return str(logdir)

Set up random seeds and default float for gpflow tensors:


Loading data using TensorFlow Datasets

For this example, we create a synthetic dataset (noisy sine function):

def noisy_sin(x):
    return tf.math.sin(x) + 0.1 * tf.random.normal(x.shape, dtype=default_float())

num_train_data, num_test_data = 100, 500

X = tf.random.uniform((num_train_data, 1), dtype=default_float()) * 10
Xtest = tf.random.uniform((num_test_data, 1), dtype=default_float()) * 10

Y = noisy_sin(X)
Ytest = noisy_sin(Xtest)

data = (X, Y)

plt.plot(X, Y, "xk")

Working with TensorFlow Datasets is an efficient way to rapidly shuffle, iterate, and batch from data. For prefetch size we use as recommended by TensorFlow guidelines.

train_dataset =, Y))
test_dataset =, Ytest))

batch_size = 32
num_features = 10
prefetch_size =
shuffle_buffer_size = num_train_data // 2
num_batches_per_epoch = num_train_data // batch_size

original_train_dataset = train_dataset
train_dataset = (


Define a GP model

In GPflow 2.0, we use tf.Module (or the very thin gpflow.base.Module wrapper) to build all our models, as well as their components (kernels, likelihoods, parameters, and so on).

kernel = gpflow.kernels.SquaredExponential(variance=2.0)
likelihood = gpflow.likelihoods.Gaussian()
inducing_variable = np.linspace(0, 10, num_features).reshape(-1, 1)

model = gpflow.models.SVGP(
    kernel=kernel, likelihood=likelihood, inducing_variable=inducing_variable

You can set a module (or a particular parameter) to be non-trainable using the auxiliary method set_trainable(module, False):

from gpflow import set_trainable

set_trainable(likelihood, False)
set_trainable(kernel.variance, False)

set_trainable(likelihood, True)
set_trainable(kernel.variance, True)

We can use param.assign(value) to assign a value to a parameter:

<tf.Variable 'UnreadVariable' shape=() dtype=float64, numpy=-0.43275212956718856>

All these changes are reflected when we use print_summary(model) to print a detailed summary of the model. By default the output is displayed in a minimalistic and simple table.

from gpflow.utilities import print_summary

print_summary(model)  # same as print_summary(model, fmt="fancy_table")
│ name                     │ class     │ transform        │ prior   │ trainable   │ shape       │ dtype   │ value            │
│ SVGP.kernel.variance     │ Parameter │ Softplus         │         │ True        │ ()          │ float64 │ 2.0              │
│ SVGP.kernel.lengthscales │ Parameter │ Softplus         │         │ True        │ ()          │ float64 │ 0.5              │
│ SVGP.likelihood.variance │ Parameter │ Softplus + Shift │         │ True        │ ()          │ float64 │ 1.0              │
│ SVGP.inducing_variable.Z │ Parameter │                  │         │ True        │ (10, 1)     │ float64 │ [[0....          │
│ SVGP.q_mu                │ Parameter │                  │         │ True        │ (10, 1)     │ float64 │ [[0....          │
│ SVGP.q_sqrt              │ Parameter │ FillTriangular   │         │ True        │ (1, 10, 10) │ float64 │ [[[1., 0., 0.... │

We can change default printing so that it will look nicer in our notebook:


print_summary(model)  # same as print_summary(model, fmt="notebook")
name class transform prior trainable shape dtype value
SVGP.kernel.variance ParameterSoftplus True () float642.0
SVGP.kernel.lengthscalesParameterSoftplus True () float640.5
SVGP.likelihood.varianceParameterSoftplus + Shift True () float641.0
SVGP.inducing_variable.ZParameter True (10, 1) float64[[0....
SVGP.q_mu Parameter True (10, 1) float64[[0....
SVGP.q_sqrt ParameterFillTriangular True (1, 10, 10)float64[[[1., 0., 0....

Jupyter notebooks also format GPflow classes (that are subclasses of gpflow.base.Module) in the same nice way when at the end of a cell (this is independent of the default_summary_fmt):

<gpflow.models.svgp.SVGP object at 0x7f4c2d74e2b0>
name class transform prior trainable shape dtype value
SVGP.kernel.variance ParameterSoftplus True () float642.0
SVGP.kernel.lengthscalesParameterSoftplus True () float640.5
SVGP.likelihood.varianceParameterSoftplus + Shift True () float641.0
SVGP.inducing_variable.ZParameter True (10, 1) float64[[0....
SVGP.q_mu Parameter True (10, 1) float64[[0....
SVGP.q_sqrt ParameterFillTriangular True (1, 10, 10)float64[[[1., 0., 0....

Training using training_loss and training_loss_closure

GPflow models come with training_loss and training_loss_closure methods to make it easy to train your models. There is a slight difference between models that own their own data (most of them, e.g. GPR, VGP, …) and models that do not own the data (SVGP).

Model-internal data

For models that own their own data (inheriting from InternalDataTrainingLossMixin), data is provided at model construction time. In this case, model.training_loss does not take any arguments, and can be directly passed to an optimizer’s minimize() method:

vgp_model = gpflow.models.VGP(data, kernel, likelihood)
optimizer = tf.optimizers.Adam()
    vgp_model.training_loss, vgp_model.trainable_variables
)  # Note: this does a single step
# In practice, you will need to call minimize() many times, this will be further discussed below.
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=1>

This also works for the Scipy optimizer, though it will do the full optimization on a single call to minimize():

optimizer = gpflow.optimizers.Scipy()
    vgp_model.training_loss, vgp_model.trainable_variables, options=dict(maxiter=ci_niter(1000))
      fun: -67.28078578558977
 hess_inv: <5153x5153 LbfgsInvHessProduct with dtype=float64>
      jac: array([ 2.72992855e-02,  2.62358059e-03, -9.17773041e-04, ...,
        1.62712087e-06, -1.19699086e-06,  1.00073483e-04])
     nfev: 654
      nit: 616
   status: 0
  success: True
        x: array([ 1.71832416e+00,  7.87852399e-01, -4.65257496e+00, ...,
        8.14457745e-05, -1.31619047e-04,  9.98382512e-01])

You can obtain a compiled version using training_loss_closure, whose compile argument is True by default:

vgp_model.training_loss_closure()  # compiled
vgp_model.training_loss_closure(compile=True)  # compiled
vgp_model.training_loss_closure(compile=False)  # uncompiled, same as vgp_model.training_loss
<bound method InternalDataTrainingLossMixin.training_loss of <gpflow.models.vgp.VGP object at 0x7f4c2ca47710>>

The SVGP model inherits from ExternalDataTrainingLossMixin and expects the data to be passed to training_loss(). For SVGP as for the other regression models, data is a two-tuple of (X, Y), where X is an array/tensor with shape (num_data, input_dim) and Y is an array/tensor with shape (num_data, output_dim):

assert isinstance(model, gpflow.models.SVGP)
<tf.Tensor: shape=(), dtype=float64, numpy=8197.418250350682>

To make optimizing it easy, it has a training_loss_closure() method, that takes the data and returns a closure that computes the training loss on this data:

optimizer = tf.optimizers.Adam()
training_loss = model.training_loss_closure(
)  # We save the compiled closure in a variable so as not to re-compile it each step
optimizer.minimize(training_loss, model.trainable_variables)  # Note that this does a single step
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=1>

SVGP can handle mini-batching, and an iterator from a batched can be passed to the model’s training_loss_closure():

batch_size = 5
batched_dataset =
training_loss = model.training_loss_closure(iter(batched_dataset))

optimizer.minimize(training_loss, model.trainable_variables)  # Note that this does a single step
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=2>

As previously, training_loss_closure takes an optional compile argument for tf.function compilation (True by default).

Training using Gradient Tapes

For a more elaborate example of a gradient update we can define an optimization_step that explicitly computes and applies gradients to the model. In TensorFlow 2, we can optimize (trainable) model parameters with TensorFlow optimizers using tf.GradientTape. In this simple example, we perform one gradient update of the Adam optimizer to minimize the training_loss (in this case the negative ELBO) of our model. The optimization_step can (and should) be wrapped in tf.function to be compiled to a graph if executing it many times.

def optimization_step(model: gpflow.models.SVGP, batch: Tuple[tf.Tensor, tf.Tensor]):
    with tf.GradientTape(watch_accessed_variables=False) as tape:
        loss = model.training_loss(batch)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return loss

We can use the functionality of TensorFlow Datasets to define a simple training loop that iterates over batches of the training dataset:

def simple_training_loop(model: gpflow.models.SVGP, epochs: int = 1, logging_epoch_freq: int = 10):
    tf_optimization_step = tf.function(optimization_step)

    batches = iter(train_dataset)
    for epoch in range(epochs):
        for _ in range(ci_niter(num_batches_per_epoch)):
            tf_optimization_step(model, next(batches))

        epoch_id = epoch + 1
        if epoch_id % logging_epoch_freq == 0:
            tf.print(f"Epoch {epoch_id}: ELBO (train) {model.elbo(data)}")
simple_training_loop(model, epochs=10, logging_epoch_freq=2)
Epoch 2: ELBO (train) -7926.111921760165
Epoch 4: ELBO (train) -7721.99930789149
Epoch 6: ELBO (train) -7515.65797330493
Epoch 8: ELBO (train) -7311.34593176836
Epoch 10: ELBO (train) -7110.451170633543


gpflow.monitor provides a thin wrapper on top of tf.summary that makes it easy to monitor the training procedure. For a more detailed tutorial see the monitoring notebook.

from gpflow.monitor import (

samples_input = np.linspace(0, 10, 100).reshape(-1, 1)

def plot_model(fig, ax):
    mean, var = model.predict_f(samples_input)
    num_samples = 10
    samples = model.predict_f_samples(samples_input, num_samples)
    ax.plot(samples_input, mean, "C0", lw=2)
        samples_input[:, 0],
        mean[:, 0] - 1.96 * np.sqrt(var[:, 0]),
        mean[:, 0] + 1.96 * np.sqrt(var[:, 0]),
    ax.plot(X, Y, "kx")
    ax.plot(samples_input, samples[:, :, 0].numpy().T, "C0", linewidth=0.5)
    ax.set_ylim(-2.0, +2.0)
    ax.set_xlim(0, 10)

def print_cb(epoch_id=None, data=None):
    tf.print(f"Epoch {epoch_id}: ELBO (train)", model.elbo(data))

def elbo_cb(data=None, **_):
    return model.elbo(data)

output_logdir = enumerated_logdir()

model_task = ModelToTensorBoard(output_logdir, model)
elbo_task = ScalarToTensorBoard(output_logdir, elbo_cb, "elbo")
print_task = ExecuteCallback(callback=print_cb)

# We group these tasks and specify a period of `100` steps for them
fast_tasks = MonitorTaskGroup([model_task, elbo_task, print_task], period=100)

# We also want to see the model's fit during the optimisation
image_task = ImageToTensorBoard(output_logdir, plot_model, "samples_image")

# We typically don't want to plot too frequently during optimisation,
# which is why we specify a larger period for this task.
slow_taks = MonitorTaskGroup(image_task, period=500)
monitor = Monitor(fast_tasks, slow_taks)

def monitored_training_loop(epochs: int):
    tf_optimization_step = tf.function(optimization_step)

    batches = iter(train_dataset)

    for epoch in range(epochs):
        for _ in range(ci_niter(num_batches_per_epoch)):
            batch = next(batches)
            tf_optimization_step(model, batch)

        epoch_id = epoch + 1
        monitor(epoch, epoch_id=epoch_id, data=data)

NOTE: for optimal performance it is recommended to wrap the monitoring inside tf.function. This is detailed in the monitoring notebook.

model = gpflow.models.SVGP(
    kernel=kernel, likelihood=likelihood, inducing_variable=inducing_variable

Epoch 1: ELBO (train) -7789.6452856668866
Epoch 101: ELBO (train) -2021.3204550733249
Epoch 201: ELBO (train) -840.54344338514625
Epoch 301: ELBO (train) -370.44491395531128
Epoch 401: ELBO (train) -156.38755328626456
Epoch 501: ELBO (train) -54.224209621895973
Epoch 601: ELBO (train) -7.5147314638143765
Epoch 701: ELBO (train) 14.668335934402975
Epoch 801: ELBO (train) 26.843925293003583
Epoch 901: ELBO (train) 34.522862288664925

Then, we can use TensorBoard to examine the training procedure in more detail

# %tensorboard --logdir "{output_logdir}"

Saving and loading models


With the help of tf.train.CheckpointManager and tf.train.Checkpoint, we can checkpoint the model throughout the training procedure. Let’s start with a simple example using checkpointing to save and load a tf.Variable:

initial_value = 1.2
a = tf.Variable(initial_value)

Create Checkpoint object:

ckpt = tf.train.Checkpoint(a=a)
manager = tf.train.CheckpointManager(ckpt, output_logdir, max_to_keep=3)

Save the variable a and change its value right after:

_ = a.assign(0.33)

Now we can restore the old variable value:

print(f"Current value of variable a: {a.numpy():0.3f}")


print(f"Value of variable a after restore: {a.numpy():0.3f}")
Current value of variable a: 0.330
Value of variable a after restore: 1.200

In the example below, we modify a simple training loop to save the model every 100 epochs using the CheckpointManager.

model = gpflow.models.SVGP(
    kernel=kernel, likelihood=likelihood, inducing_variable=inducing_variable

def checkpointing_training_loop(
    model: gpflow.models.SVGP,
    batch_size: int,
    epochs: int,
    manager: tf.train.CheckpointManager,
    logging_epoch_freq: int = 100,
    epoch_var: Optional[tf.Variable] = None,
    step_var: Optional[tf.Variable] = None,
    tf_optimization_step = tf.function(optimization_step)

    batches = iter(train_dataset)

    for epoch in range(epochs):
        for step in range(ci_niter(num_batches_per_epoch)):
            tf_optimization_step(model, next(batches))
            if step_var is not None:
                step_var.assign(epoch * num_batches_per_epoch + step + 1)
        if epoch_var is not None:
            epoch_var.assign(epoch + 1)

        epoch_id = epoch + 1
        if epoch_id % logging_epoch_freq == 0:
            ckpt_path =
            tf.print(f"Epoch {epoch_id}: ELBO (train) {model.elbo(data)}, saved at {ckpt_path}")
step_var = tf.Variable(1, dtype=tf.int32, trainable=False)
epoch_var = tf.Variable(1, dtype=tf.int32, trainable=False)
ckpt = tf.train.Checkpoint(model=model, step=step_var, epoch=epoch_var)
manager = tf.train.CheckpointManager(ckpt, output_logdir, max_to_keep=5)

print(f"Checkpoint folder path at: {output_logdir}")

Checkpoint folder path at: /tmp/tensorboard/0
Epoch 100: ELBO (train) -149.50562733101683, saved at /tmp/tensorboard/0/ckpt-1
Epoch 200: ELBO (train) -8.544119245429293, saved at /tmp/tensorboard/0/ckpt-2
Epoch 300: ELBO (train) 24.732681722278755, saved at /tmp/tensorboard/0/ckpt-3
Epoch 400: ELBO (train) 36.71558117864048, saved at /tmp/tensorboard/0/ckpt-4
Epoch 500: ELBO (train) 42.093254355534015, saved at /tmp/tensorboard/0/ckpt-5
Epoch 600: ELBO (train) 45.016709584319905, saved at /tmp/tensorboard/0/ckpt-6
Epoch 700: ELBO (train) 46.87268322560482, saved at /tmp/tensorboard/0/ckpt-7
Epoch 800: ELBO (train) 48.205684409091916, saved at /tmp/tensorboard/0/ckpt-8
Epoch 900: ELBO (train) 49.25871097370154, saved at /tmp/tensorboard/0/ckpt-9
Epoch 1000: ELBO (train) 50.157254424668174, saved at /tmp/tensorboard/0/ckpt-10

After the models have been saved, we can restore them using tf.train.Checkpoint.restore and assert that their performance corresponds to that logged during training.

for i, recorded_checkpoint in enumerate(manager.checkpoints):
        f"{i} restored model from epoch {int(epoch_var)} [step:{int(step_var)}] : ELBO training set {model.elbo(data)}"
0 restored model from epoch 600 [step:1800] : ELBO training set 45.016709584319905
1 restored model from epoch 700 [step:2100] : ELBO training set 46.87268322560482
2 restored model from epoch 800 [step:2400] : ELBO training set 48.205684409091916
3 restored model from epoch 900 [step:2700] : ELBO training set 49.25871097370154
4 restored model from epoch 1000 [step:3000] : ELBO training set 50.157254424668174

Copying (hyper)parameter values between models

It is easy to interact with the set of all parameters of a model or a subcomponent programmatically.

The following returns a dictionary of all parameters within

model = gpflow.models.SGPR(data, kernel=kernel, inducing_variable=inducing_variable)
{'.kernel.variance': <gpflow.Parameter 'Variable:0' dtype=float64 unconstrained-shape=() unconstrained-value=0.36247287352968055 constrained-shape=() constrained-value=0.8907178115658207>,
 '.kernel.lengthscales': <gpflow.Parameter 'Variable:0' dtype=float64 unconstrained-shape=() unconstrained-value=1.4757353794379913 constrained-shape=() constrained-value=1.6816192759290174>,
 '.likelihood.variance': <gpflow.Parameter 'Variable:0' dtype=float64 unconstrained-shape=() unconstrained-value=0.5413232726357509 constrained-shape=() constrained-value=1.0>,
 '.inducing_variable.Z': <gpflow.Parameter 'Variable:0' dtype=float64 unconstrained-shape=(10, 1) unconstrained-value=[[ 0.        ]
  [ 1.11111111]
  [ 2.22222222]
  [ 3.33333333]
  [ 4.44444444]
  [ 5.55555556]
  [ 6.66666667]
  [ 7.77777778]
  [ 8.88888889]
  [10.        ]] constrained-shape=(10, 1) constrained-value=[[ 0.        ]
  [ 1.11111111]
  [ 2.22222222]
  [ 3.33333333]
  [ 4.44444444]
  [ 5.55555556]
  [ 6.66666667]
  [ 7.77777778]
  [ 8.88888889]
  [10.        ]]>}

Such a dictionary can be assigned back to this model (or another model with the same tree of parameters) as follows:

params = gpflow.utilities.parameter_dict(model)
gpflow.utilities.multiple_assign(model, params)

TensorFlow saved_model

At present, TensorFlow does not support saving custom variables like instances of the gpflow.base.Parameter class, see this TensorFlow github issue.

However, once training is complete, it is possible to clone the model and replace all gpflow.base.Parameters with tf.constants holding the same value:

<gpflow.models.sgpr.SGPR object at 0x7f4b5449da58>
name class transform prior trainable shape dtype value
SGPR.kernel.variance ParameterSoftplus True () float640.8907178115658207
SGPR.kernel.lengthscalesParameterSoftplus True () float641.6816192759290174
SGPR.likelihood.varianceParameterSoftplus + Shift True () float641.0
SGPR.inducing_variable.ZParameter True (10, 1)float64[[0....
frozen_model = gpflow.utilities.freeze(model)

In order to save the model we need to define a tf.Module holding the tf.function’s that we wish to export, as well as a reference to the underlying model:

module_to_save = tf.Module()
predict_fn = tf.function(
    frozen_model.predict_f, input_signature=[tf.TensorSpec(shape=[None, 1], dtype=tf.float64)]
module_to_save.predict = predict_fn

Save original result for futher comparison. We also convert samples_input to a tensor. For a tensor input a tf.function will compile a single graph.

samples_input = tf.convert_to_tensor(samples_input, dtype=default_float())
original_result = module_to_save.predict(samples_input)

Let’s save the module

save_dir = str(pathlib.Path(tempfile.gettempdir())), save_dir)
INFO:tensorflow:Assets written to: /tmp/assets

Load module back as new instance and compare predict results

loaded_model = tf.saved_model.load(save_dir)
loaded_result = loaded_model.predict(samples_input)

np.testing.assert_array_equal(loaded_result, original_result)

User config update

In this notebook, we used a lot gpflow.config methods for setting and getting default attributes from global configuration. However, GPflow provides a way for local config modification without updating values in global. As you can see below, using gpflow.config.as_context replaces temporarily global config with your instance. At creation time, custom config instance uses standard values from the global config:

user_config = gpflow.config.Config(float=tf.float32, positive_bijector="exp")

user_str = "User config\t"
global_str = "Global config\t"

with gpflow.config.as_context(user_config):
    print(f"{user_str} gpflow.config.default_float = {gpflow.config.default_float()}")
        f"{user_str} gpflow.config.positive_bijector = {gpflow.config.default_positive_bijector()}"

print(f"{global_str} gpflow.config.default_float = {gpflow.config.default_float()}")
print(f"{global_str} gpflow.config.positive_bijector = {gpflow.config.default_positive_bijector()}")
User config      gpflow.config.default_float = <dtype: 'float32'>
User config      gpflow.config.positive_bijector = exp
Global config    gpflow.config.default_float = <class 'numpy.float64'>
Global config    gpflow.config.positive_bijector = softplus
with gpflow.config.as_context(user_config):
    p = gpflow.Parameter(1.1, transform=gpflow.utilities.positive())

p = gpflow.Parameter(1.1, transform=gpflow.utilities.positive())
User config     <gpflow.Parameter 'Variable:0' dtype=float32 unconstrained-shape=() unconstrained-value=0.09531020373106003 constrained-shape=() constrained-value=1.100000023841858>
Global config   <gpflow.Parameter 'Variable:0' dtype=float64 unconstrained-shape=() unconstrained-value=0.6952280260717102 constrained-shape=() constrained-value=1.1>