GPflow with TensorFlow 2

Small steps big changes

from typing import Tuple, Optional
import tempfile
import pathlib

import datetime
import io
import matplotlib.pyplot as plt

import numpy as np
import tensorflow as tf
import gpflow

from gpflow.config import default_float
from gpflow.ci_utils import ci_niter
from gpflow.utilities import to_default_float

import warnings


Make tensorboard work inside notebook:

output_logdir = "/tmp/tensorboard"

!rm -rf "{output_logdir}"
!mkdir "{output_logdir}"

%load_ext tensorboard
%matplotlib inline

def enumerated_logdir(_logdir_id: int = [0]):
    logdir = pathlib.Path(output_logdir, str(_logdir_id[0]))
    _logdir_id[0] += 1
    return str(logdir)

Set up random seeds and default float for gpflow tensors:


Loading data using TensorFlow Datasets

For this example, we create a synthetic dataset (noisy sine function):

def noisy_sin(x):
    return tf.math.sin(x) + 0.1 * tf.random.normal(x.shape, dtype=default_float())

num_train_data, num_test_data = 100, 500

X = tf.random.uniform((num_train_data, 1), dtype=default_float()) * 10
Xtest = tf.random.uniform((num_test_data, 1), dtype=default_float()) * 10

Y = noisy_sin(X)
Ytest = noisy_sin(Xtest)

data = (X, Y)

plt.plot(X, Y, "xk")

Working with TensorFlow Datasets is an efficient way to rapidly shuffle, iterate, and batch from data. For prefetch size we use as recommended by TensorFlow guidelines.

train_dataset =, Y))
test_dataset =, Ytest))

batch_size = 32
num_features = 10
prefetch_size =
shuffle_buffer_size = num_train_data // 2
num_batches_per_epoch = num_train_data // batch_size

original_train_dataset = train_dataset
train_dataset = (


Define a GP model

In GPflow 2.0, we use tf.Module (or the very thin gpflow.base.Module wrapper) to build all our models, as well as their components (kernels, likelihoods, parameters, and so on).

kernel = gpflow.kernels.SquaredExponential(variance=2.0)
likelihood = gpflow.likelihoods.Gaussian()
inducing_variable = np.linspace(0, 10, num_features).reshape(-1, 1)

model = gpflow.models.SVGP(
    kernel=kernel, likelihood=likelihood, inducing_variable=inducing_variable

You can set a module (or a particular parameter) to be non-trainable using the auxiliary method set_trainable(module, False):

from gpflow import set_trainable

set_trainable(likelihood, False)
set_trainable(kernel.variance, False)

set_trainable(likelihood, True)
set_trainable(kernel.variance, True)

We can use param.assign(value) to assign a value to a parameter:

<tf.Variable 'UnreadVariable' shape=() dtype=float64, numpy=-0.43275212956718856>

All these changes are reflected when we use print_summary(model) to print a detailed summary of the model. By default the output is displayed in a minimalistic and simple table.

from gpflow.utilities import print_summary

print_summary(model)  # same as print_summary(model, fmt="fancy_table")
│ name                     │ class     │ transform        │ prior   │ trainable   │ shape       │ dtype   │ value            │
│ SVGP.kernel.variance     │ Parameter │ Softplus         │         │ True        │ ()          │ float64 │ 2.0              │
│ SVGP.kernel.lengthscales │ Parameter │ Softplus         │         │ True        │ ()          │ float64 │ 0.5              │
│ SVGP.likelihood.variance │ Parameter │ Softplus + Shift │         │ True        │ ()          │ float64 │ 1.0              │
│ SVGP.inducing_variable.Z │ Parameter │ Identity         │         │ True        │ (10, 1)     │ float64 │ [[0....          │
│ SVGP.q_mu                │ Parameter │ Identity         │         │ True        │ (10, 1)     │ float64 │ [[0....          │
│ SVGP.q_sqrt              │ Parameter │ FillTriangular   │         │ True        │ (1, 10, 10) │ float64 │ [[[1., 0., 0.... │

We can change default printing so that it will look nicer in our notebook:


print_summary(model)  # same as print_summary(model, fmt="notebook")
name class transform prior trainable shape dtype value
SVGP.kernel.variance ParameterSoftplus True () float642.0
SVGP.kernel.lengthscalesParameterSoftplus True () float640.5
SVGP.likelihood.varianceParameterSoftplus + Shift True () float641.0
SVGP.inducing_variable.ZParameterIdentity True (10, 1) float64[[0....
SVGP.q_mu ParameterIdentity True (10, 1) float64[[0....
SVGP.q_sqrt ParameterFillTriangular True (1, 10, 10)float64[[[1., 0., 0....

Jupyter notebooks also format GPflow classes (that are subclasses of gpflow.base.Module) in the same nice way when at the end of a cell (this is independent of the default_summary_fmt):

<gpflow.models.svgp.SVGP object at 0x7f8d17f40208>
name class transform prior trainable shape dtype value
SVGP.kernel.variance ParameterSoftplus True () float642.0
SVGP.kernel.lengthscalesParameterSoftplus True () float640.5
SVGP.likelihood.varianceParameterSoftplus + Shift True () float641.0
SVGP.inducing_variable.ZParameterIdentity True (10, 1) float64[[0....
SVGP.q_mu ParameterIdentity True (10, 1) float64[[0....
SVGP.q_sqrt ParameterFillTriangular True (1, 10, 10)float64[[[1., 0., 0....

Training using training_loss and training_loss_closure

GPflow models come with training_loss and training_loss_closure methods to make it easy to train your models. There is a slight difference between models that own their own data (most of them, e.g. GPR, VGP, …) and models that do not own the data (SVGP).

Model-internal data

For models that own their own data (inheriting from InternalDataTrainingLossMixin), data is provided at model construction time. In this case, model.training_loss does not take any arguments, and can be directly passed to an optimizer’s minimize() method:

vgp_model = gpflow.models.VGP(data, kernel, likelihood)
optimizer = tf.optimizers.Adam()
    vgp_model.training_loss, vgp_model.trainable_variables
)  # Note: this does a single step
# In practice, you will need to call minimize() many times, this will be further discussed below.
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=1>

This also works for the Scipy optimizer, though it will do the full optimization on a single call to minimize():

optimizer = gpflow.optimizers.Scipy()
    vgp_model.training_loss, vgp_model.trainable_variables, options=dict(maxiter=ci_niter(1000))
      fun: -67.28079128791462
 hess_inv: <5153x5153 LbfgsInvHessProduct with dtype=float64>
      jac: array([-2.25604415e-02, -2.78624139e-04, -1.50722830e-03, ...,
       -2.82622428e-06, -6.98264045e-08,  6.00954249e-05])
     nfev: 686
      nit: 642
   status: 0
  success: True
        x: array([ 1.71706932e+00,  7.81478265e-01, -4.65257992e+00, ...,
        7.76230361e-05, -1.30322312e-04,  9.98360367e-01])

You can obtain a compiled version using training_loss_closure, whose compile argument is True by default:

vgp_model.training_loss_closure()  # compiled
vgp_model.training_loss_closure(compile=True)  # compiled
vgp_model.training_loss_closure(compile=False)  # uncompiled, same as vgp_model.training_loss
<bound method InternalDataTrainingLossMixin.training_loss of <gpflow.models.vgp.VGP object at 0x7f8d17ab05f8>>

The SVGP model inherits from ExternalDataTrainingLossMixin and expects the data to be passed to training_loss(). For SVGP as for the other regression models, data is a two-tuple of (X, Y), where X is an array/tensor with shape (num_data, input_dim) and Y is an array/tensor with shape (num_data, output_dim):

assert isinstance(model, gpflow.models.SVGP)
<tf.Tensor: shape=(), dtype=float64, numpy=8174.404629452514>

To make optimizing it easy, it has a training_loss_closure() method, that takes the data and returns a closure that computes the training loss on this data:

optimizer = tf.optimizers.Adam()
training_loss = model.training_loss_closure(
)  # We save the compiled closure in a variable so as not to re-compile it each step
optimizer.minimize(training_loss, model.trainable_variables)  # Note that this does a single step
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=1>

SVGP can handle mini-batching, and an iterator from a batched can be passed to the model’s training_loss_closure():

batch_size = 5
batched_dataset =
training_loss = model.training_loss_closure(iter(batched_dataset))

optimizer.minimize(training_loss, model.trainable_variables)  # Note that this does a single step
<tf.Variable 'UnreadVariable' shape=() dtype=int64, numpy=2>

As previously, training_loss_closure takes an optional compile argument for tf.function compilation (True by default).

Training using Gradient Tapes

For a more elaborate example of a gradient update we can define an optimization_step that explicitly computes and applies gradients to the model. In TensorFlow 2, we can optimize (trainable) model parameters with TensorFlow optimizers using tf.GradientTape. In this simple example, we perform one gradient update of the Adam optimizer to minimize the training_loss (in this case the negative ELBO) of our model. The optimization_step can (and should) be wrapped in tf.function to be compiled to a graph if executing it many times.

def optimization_step(model: gpflow.models.SVGP, batch: Tuple[tf.Tensor, tf.Tensor]):
    with tf.GradientTape(watch_accessed_variables=False) as tape:
        loss = model.training_loss(batch)
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return loss

We can use the functionality of TensorFlow Datasets to define a simple training loop that iterates over batches of the training dataset:

def simple_training_loop(model: gpflow.models.SVGP, epochs: int = 1, logging_epoch_freq: int = 10):
    tf_optimization_step = tf.function(optimization_step)

    batches = iter(train_dataset)
    for epoch in range(epochs):
        for _ in range(ci_niter(num_batches_per_epoch)):
            tf_optimization_step(model, next(batches))

        epoch_id = epoch + 1
        if epoch_id % logging_epoch_freq == 0:
            tf.print(f"Epoch {epoch_id}: ELBO (train) {model.elbo(data)}")
simple_training_loop(model, epochs=10, logging_epoch_freq=2)
Epoch 2: ELBO (train) -7903.988835809462
Epoch 4: ELBO (train) -7700.543396714054
Epoch 6: ELBO (train) -7494.8730472329435
Epoch 8: ELBO (train) -7291.223676702793
Epoch 10: ELBO (train) -7090.978918148187


gpflow.monitor provides a thin wrapper on top of tf.summary that makes it easy to monitor the training procedure. For a more detailed tutorial see the monitoring notebook.

from gpflow.monitor import (

samples_input = np.linspace(0, 10, 100).reshape(-1, 1)

def plot_model(fig, ax):
    mean, var = model.predict_f(samples_input)
    num_samples = 10
    samples = model.predict_f_samples(samples_input, num_samples)
    ax.plot(samples_input, mean, "C0", lw=2)
        samples_input[:, 0],
        mean[:, 0] - 1.96 * np.sqrt(var[:, 0]),
        mean[:, 0] + 1.96 * np.sqrt(var[:, 0]),
    ax.plot(X, Y, "kx")
    ax.plot(samples_input, samples[:, :, 0].numpy().T, "C0", linewidth=0.5)
    ax.set_ylim(-2.0, +2.0)
    ax.set_xlim(0, 10)

def print_cb(epoch_id=None, data=None):
    tf.print(f"Epoch {epoch_id}: ELBO (train)", model.elbo(data))

def elbo_cb(data=None, **_):
    return model.elbo(data)

output_logdir = enumerated_logdir()

model_task = ModelToTensorBoard(output_logdir, model)
elbo_task = ScalarToTensorBoard(output_logdir, elbo_cb, "elbo")
print_task = ExecuteCallback(callback=print_cb)

# We group these tasks and specify a period of `100` steps for them
fast_tasks = MonitorTaskGroup([model_task, elbo_task, print_task], period=100)

# We also want to see the model's fit during the optimisation
image_task = ImageToTensorBoard(output_logdir, plot_model, "samples_image")

# We typically don't want to plot too frequently during optimisation,
# which is why we specify a larger period for this task.
slow_taks = MonitorTaskGroup(image_task, period=500)
monitor = Monitor(fast_tasks, slow_taks)

def monitored_training_loop(epochs: int):
    tf_optimization_step = tf.function(optimization_step)

    batches = iter(train_dataset)

    for epoch in range(epochs):
        for _ in range(ci_niter(num_batches_per_epoch)):
            batch = next(batches)
            tf_optimization_step(model, batch)

        epoch_id = epoch + 1
        monitor(epoch, epoch_id=epoch_id, data=data)

NOTE: for optimal performance it is recommended to wrap the monitoring inside tf.function. This is detailed in the monitoring notebook.

model = gpflow.models.SVGP(
    kernel=kernel, likelihood=likelihood, inducing_variable=inducing_variable

Epoch 1: ELBO (train) -7767.7469041457516
Epoch 101: ELBO (train) -2017.3622999663955
Epoch 201: ELBO (train) -838.90185619748468
Epoch 301: ELBO (train) -369.42569315470621
Epoch 401: ELBO (train) -155.69218941814947
Epoch 501: ELBO (train) -53.800683602864105
Epoch 601: ELBO (train) -7.258681953589047
Epoch 701: ELBO (train) 14.833461810170725
Epoch 801: ELBO (train) 26.963288283536809
Epoch 901: ELBO (train) 34.6189993133133

Then, we can use TensorBoard to examine the training procedure in more detail

# %tensorboard --logdir "{output_logdir}"

Saving and loading models


With the help of tf.train.CheckpointManager and tf.train.Checkpoint, we can checkpoint the model throughout the training procedure. Let’s start with a simple example using checkpointing to save and load a tf.Variable:

initial_value = 1.2
a = tf.Variable(initial_value)

Create Checkpoint object:

ckpt = tf.train.Checkpoint(a=a)
manager = tf.train.CheckpointManager(ckpt, output_logdir, max_to_keep=3)

Save the variable a and change its value right after:

_ = a.assign(0.33)

Now we can restore the old variable value:

print(f"Current value of variable a: {a.numpy():0.3f}")


print(f"Value of variable a after restore: {a.numpy():0.3f}")
Current value of variable a: 0.330
Value of variable a after restore: 1.200

In the example below, we modify a simple training loop to save the model every 100 epochs using the CheckpointManager.

model = gpflow.models.SVGP(
    kernel=kernel, likelihood=likelihood, inducing_variable=inducing_variable

def checkpointing_training_loop(
    model: gpflow.models.SVGP,
    batch_size: int,
    epochs: int,
    manager: tf.train.CheckpointManager,
    logging_epoch_freq: int = 100,
    epoch_var: Optional[tf.Variable] = None,
    step_var: Optional[tf.Variable] = None,
    tf_optimization_step = tf.function(optimization_step)

    batches = iter(train_dataset)

    for epoch in range(epochs):
        for step in range(ci_niter(num_batches_per_epoch)):
            tf_optimization_step(model, next(batches))
            if step_var is not None:
                step_var.assign(epoch * num_batches_per_epoch + step + 1)
        if epoch_var is not None:
            epoch_var.assign(epoch + 1)

        epoch_id = epoch + 1
        if epoch_id % logging_epoch_freq == 0:
            ckpt_path =
            tf.print(f"Epoch {epoch_id}: ELBO (train) {model.elbo(data)}, saved at {ckpt_path}")
step_var = tf.Variable(1, dtype=tf.int32, trainable=False)
epoch_var = tf.Variable(1, dtype=tf.int32, trainable=False)
ckpt = tf.train.Checkpoint(model=model, step=step_var, epoch=epoch_var)
manager = tf.train.CheckpointManager(ckpt, output_logdir, max_to_keep=5)

print(f"Checkpoint folder path at: {output_logdir}")

Checkpoint folder path at: /tmp/tensorboard/0
Epoch 100: ELBO (train) -148.90231634017277, saved at /tmp/tensorboard/0/ckpt-1
Epoch 200: ELBO (train) -8.330552873460503, saved at /tmp/tensorboard/0/ckpt-2
Epoch 300: ELBO (train) 24.82118324059526, saved at /tmp/tensorboard/0/ckpt-3
Epoch 400: ELBO (train) 36.764394118115185, saved at /tmp/tensorboard/0/ckpt-4
Epoch 500: ELBO (train) 42.126692248239664, saved at /tmp/tensorboard/0/ckpt-5
Epoch 600: ELBO (train) 45.04171897537742, saved at /tmp/tensorboard/0/ckpt-6
Epoch 700: ELBO (train) 46.89234938970012, saved at /tmp/tensorboard/0/ckpt-7
Epoch 800: ELBO (train) 48.22189698586411, saved at /tmp/tensorboard/0/ckpt-8
Epoch 900: ELBO (train) 49.27267657133913, saved at /tmp/tensorboard/0/ckpt-9
Epoch 1000: ELBO (train) 50.169791006141196, saved at /tmp/tensorboard/0/ckpt-10

After the models have been saved, we can restore them using tf.train.Checkpoint.restore and assert that their performance corresponds to that logged during training.

for i, recorded_checkpoint in enumerate(manager.checkpoints):
        f"{i} restored model from epoch {int(epoch_var)} [step:{int(step_var)}] : ELBO training set {model.elbo(data)}"
0 restored model from epoch 600 [step:1800] : ELBO training set 45.04171897537742
1 restored model from epoch 700 [step:2100] : ELBO training set 46.89234938970012
2 restored model from epoch 800 [step:2400] : ELBO training set 48.22189698586411
3 restored model from epoch 900 [step:2700] : ELBO training set 49.27267657133913
4 restored model from epoch 1000 [step:3000] : ELBO training set 50.169791006141196

Copying (hyper)parameter values between models

It is easy to interact with the set of all parameters of a model or a subcomponent programmatically.

The following returns a dictionary of all parameters within

model = gpflow.models.SGPR(data, kernel=kernel, inducing_variable=inducing_variable)
{'.kernel.variance': <Parameter: dtype=float64, shape=[], fn="softplus", numpy=0.8877937351786808>,
 '.kernel.lengthscales': <Parameter: dtype=float64, shape=[], fn="softplus", numpy=1.6801855585977894>,
 '.likelihood.variance': <Parameter: dtype=float64, shape=[], fn="chain_of_shift_of_softplus", numpy=1.0>,
 '.inducing_variable.Z': <Parameter: dtype=float64, shape=[10, 1], fn="identity", numpy=
 array([[ 0.        ],
        [ 1.11111111],
        [ 2.22222222],
        [ 3.33333333],
        [ 4.44444444],
        [ 5.55555556],
        [ 6.66666667],
        [ 7.77777778],
        [ 8.88888889],
        [10.        ]])>}

Such a dictionary can be assigned back to this model (or another model with the same tree of parameters) as follows:

params = gpflow.utilities.parameter_dict(model)
gpflow.utilities.multiple_assign(model, params)

TensorFlow saved_model

At present, TensorFlow does not support saving custom variables like instances of the gpflow.base.Parameter class, see this TensorFlow github issue.

However, once training is complete, it is possible to clone the model and replace all gpflow.base.Parameters with tf.constants holding the same value:

<gpflow.models.sgpr.SGPR object at 0x7f8d17f40748>
name class transform prior trainable shape dtype value
SGPR.kernel.variance ParameterSoftplus True () float640.8877937351786808
SGPR.kernel.lengthscalesParameterSoftplus True () float641.6801855585977894
SGPR.likelihood.varianceParameterSoftplus + Shift True () float641.0
SGPR.inducing_variable.ZParameterIdentity True (10, 1)float64[[0....
frozen_model = gpflow.utilities.freeze(model)

In order to save the model we need to define a tf.Module holding the tf.function’s that we wish to export, as well as a reference to the underlying model:

module_to_save = tf.Module()
predict_fn = tf.function(
    frozen_model.predict_f, input_signature=[tf.TensorSpec(shape=[None, 1], dtype=tf.float64)]
module_to_save.predict = predict_fn

Save original result for futher comparison. We also convert samples_input to a tensor. For a tensor input a tf.function will compile a single graph.

samples_input = tf.convert_to_tensor(samples_input, dtype=default_float())
original_result = module_to_save.predict(samples_input)

Let’s save the module

save_dir = str(pathlib.Path(tempfile.gettempdir())), save_dir)
INFO:tensorflow:Assets written to: /tmp/assets

Load module back as new instance and compare predict results

loaded_model = tf.saved_model.load(save_dir)
loaded_result = loaded_model.predict(samples_input)

np.testing.assert_array_equal(loaded_result, original_result)

User config update

In this notebook, we used a lot gpflow.config methods for setting and getting default attributes from global configuration. However, GPflow provides a way for local config modification without updating values in global. As you can see below, using gpflow.config.as_context replaces temporarily global config with your instance. At creation time, custom config instance uses standard values from the global config:

user_config = gpflow.config.Config(float=tf.float32, positive_bijector="exp")

user_str = "User config\t"
global_str = "Global config\t"

with gpflow.config.as_context(user_config):
    print(f"{user_str} gpflow.config.default_float = {gpflow.config.default_float()}")
        f"{user_str} gpflow.config.positive_bijector = {gpflow.config.default_positive_bijector()}"

print(f"{global_str} gpflow.config.default_float = {gpflow.config.default_float()}")
print(f"{global_str} gpflow.config.positive_bijector = {gpflow.config.default_positive_bijector()}")
User config      gpflow.config.default_float = <dtype: 'float32'>
User config      gpflow.config.positive_bijector = exp
Global config    gpflow.config.default_float = <class 'numpy.float64'>
Global config    gpflow.config.positive_bijector = softplus
with gpflow.config.as_context(user_config):
    p = gpflow.Parameter(1.1, transform=gpflow.utilities.positive())

p = gpflow.Parameter(1.1, transform=gpflow.utilities.positive())
User config     <Parameter: dtype=float32, shape=[], fn="exp", numpy=1.1>
Global config   <Parameter: dtype=float64, shape=[], fn="softplus", numpy=1.1>