# gpflow.models¶

## gpflow.models.BayesianGPLVM¶

class gpflow.models.BayesianGPLVM(data, X_data_mean, X_data_var, kernel, num_inducing_variables=None, inducing_variable=None, X_prior_mean=None, X_prior_var=None)[source]

Bases: gpflow.models.model.GPModel, gpflow.models.training_mixins.InternalDataTrainingLossMixin

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. Construct a tensorflow function to compute the bound on the marginal likelihood. log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the latent function at some new points. predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method.
Parameters
• data (Tensor) –

• X_data_mean (Tensor) –

• X_data_var (Tensor) –

• kernel (Kernel) –

• num_inducing_variables (Optional[int]) –

__init__(data, X_data_mean, X_data_var, kernel, num_inducing_variables=None, inducing_variable=None, X_prior_mean=None, X_prior_var=None)[source]

Initialise Bayesian GPLVM object. This method only works with a Gaussian likelihood.

Parameters
• data (Tensor) – data matrix, size N (number of points) x D (dimensions)

• X_data_mean (Tensor) – initial latent positions, size N (number of points) x Q (latent dimensions).

• X_data_var (Tensor) – variance of latent positions ([N, Q]), for the initialisation of the latent space.

• kernel (Kernel) – kernel specification, by default Squared Exponential

• num_inducing_variables (Optional[int]) – number of inducing points, M

• inducing_variable – matrix of inducing points, size M (inducing points) x Q (latent dimensions). By default random permutation of X_data_mean.

• X_prior_mean – prior mean used in KL term of bound. By default 0. Same size as X_data_mean.

• X_prior_var – prior variance used in KL term of bound. By default 1.

elbo()[source]

Construct a tensorflow function to compute the bound on the marginal likelihood.

Return type

Tensor

maximum_log_likelihood_objective()[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]

Compute the mean and variance of the latent function at some new points. Note that this is very similar to the SGPR prediction, for which there are notes in the SGPR notebook.

Note: This model does not allow full output covariances.

Parameters

Xnew (Tensor) – points at which to predict

Parameters
• full_cov (bool) –

• full_output_cov (bool) –

Return type

Tuple[Tensor, Tensor]

predict_log_density(data)[source]

Compute the log density of the data at the new data points.

Parameters

data (Tensor) –

Return type

Tensor

## gpflow.models.BayesianModel¶

class gpflow.models.BayesianModel(*args, **kwargs)[source]

Bases: gpflow.base.Module

Bayesian model.

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. Sum of the log prior probability densities of all (constrained) variables in this model. maximum_log_likelihood_objective(*args, **kwargs) Objective for maximum likelihood estimation.
Parameters
• args (Any) –

• kwargs (Any) –

_training_loss(*args, **kwargs)[source]

Training loss definition. To allow MAP (maximum a-posteriori) estimation, adds the log density of all priors to maximum_log_likelihood_objective().

Return type

Tensor

log_posterior_density(*args, **kwargs)[source]

This may be the posterior with respect to the hyperparameters (e.g. for GPR) or the posterior with respect to the function (e.g. for GPMC and SGPMC). It assumes that maximum_log_likelihood_objective() is defined sensibly.

Return type

Tensor

log_prior_density()[source]

Sum of the log prior probability densities of all (constrained) variables in this model.

Return type

Tensor

abstract maximum_log_likelihood_objective(*args, **kwargs)[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

## gpflow.models.ExternalDataTrainingLossMixin¶

class gpflow.models.ExternalDataTrainingLossMixin[source]

Bases: object

Mixin utility for training loss methods for models that do not own their own data. It provides

See InternalDataTrainingLossMixin for an equivalent mixin for models that do own their own data.

Methods

 training_loss(data) Returns the training loss for this model. training_loss_closure(data, *[, compile]) Returns a closure that computes the training loss, which by default is wrapped in tf.function().
training_loss(data)[source]

Returns the training loss for this model.

Parameters

data (~Data) – the data to be used for computing the model objective.

Return type

Tensor

training_loss_closure(data, *, compile=True)[source]

Returns a closure that computes the training loss, which by default is wrapped in tf.function(). This can be disabled by passing compile=False.

Parameters
• data (Union[~Data, OwnedIterator]) – the data to be used by the closure for computing the model objective. Can be the full dataset or an iterator, e.g. iter(dataset.batch(batch_size)), where dataset is an instance of tf.data.Dataset.

• compile – if True, wrap training loss in tf.function()

Return type

Callable[[], Tensor]

## gpflow.models.GPLVM¶

class gpflow.models.GPLVM(data, latent_dim, X_data_mean=None, kernel=None, mean_function=None)[source]

Bases: gpflow.models.gpr.GPR

Standard GPLVM where the likelihood can be optimised with respect to the latent X.

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. log_marginal_likelihood() Computes the log marginal likelihood. log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. maximum_log_likelihood_objective() Objective for maximum likelihood estimation. predict_f(Xnew[, full_cov, full_output_cov]) This method computes predictions at X in R^{N x D} input points predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method.
Parameters
• data (Tensor) –

• latent_dim (int) –

• X_data_mean (Optional[Tensor]) –

• kernel (Optional[Kernel]) –

• mean_function (Optional[MeanFunction]) –

__init__(data, latent_dim, X_data_mean=None, kernel=None, mean_function=None)[source]

Initialise GPLVM object. This method only works with a Gaussian likelihood.

Parameters
• data (Tensor) – y data matrix, size N (number of points) x D (dimensions)

• latent_dim (int) – the number of latent dimensions (Q)

• X_data_mean (Optional[Tensor]) – latent positions ([N, Q]), for the initialisation of the latent space.

• kernel (Optional[Kernel]) – kernel specification, by default Squared Exponential

• mean_function (Optional[MeanFunction]) – mean function, by default None.

## gpflow.models.GPMC¶

class gpflow.models.GPMC(data, kernel, likelihood, mean_function=None, num_latent_gps=None)[source]

Bases: gpflow.models.model.GPModel, gpflow.models.training_mixins.InternalDataTrainingLossMixin

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. Construct a tf function to compute the likelihood of a general GP model. This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f(Xnew[, full_cov, full_output_cov]) Xnew is a data matrix, point at which we want to predict predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method.
Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

__init__(data, kernel, likelihood, mean_function=None, num_latent_gps=None)[source]

data is a tuple of X, Y with X, a data matrix, size [N, D] and Y, a data matrix, size [N, R] kernel, likelihood, mean_function are appropriate GPflow objects

This is a vanilla implementation of a GP with a non-Gaussian likelihood. The latent function values are represented by centered (whitened) variables, so

v ~ N(0, I) f = Lv + m(x)

with

L L^T = K

Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

log_likelihood()[source]

Construct a tf function to compute the likelihood of a general GP model.

log p(Y | V, theta).

Return type

Tensor

log_posterior_density()[source]

This may be the posterior with respect to the hyperparameters (e.g. for GPR) or the posterior with respect to the function (e.g. for GPMC and SGPMC). It assumes that maximum_log_likelihood_objective() is defined sensibly.

Return type

Tensor

maximum_log_likelihood_objective()[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]

Xnew is a data matrix, point at which we want to predict

This method computes

p(F* | (F=LV) )

where F* are points on the GP at Xnew, F=LV are points on the GP at X.

Parameters

Xnew (Tensor) –

Return type

Tuple[Tensor, Tensor]

## gpflow.models.GPModel¶

class gpflow.models.GPModel(kernel, likelihood, mean_function=None, num_latent_gps=None)[source]

Bases: gpflow.models.model.BayesianModel

A stateless base class for Gaussian process models, that is, those of the form

\begin{align} \theta & \sim p(\theta) \\ f & \sim \mathcal{GP}(m(x), k(x, x'; \theta)) \\ f_i & = f(x_i) \\ y_i \,|\, f_i & \sim p(y_i|f_i) \end{align}

This class mostly adds functionality for predictions. To use it, inheriting classes must define a predict_f function, which computes the means and variances of the latent function.

These predictions are then pushed through the likelihood to obtain means and variances of held out data, self.predict_y.

The predictions can also be used to compute the (log) density of held-out data via self.predict_log_density.

It is also possible to draw samples from the latent GPs using self.predict_f_samples.

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. maximum_log_likelihood_objective(*args, **kwargs) Objective for maximum likelihood estimation. predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points.
 predict_f
Parameters
• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

static calc_num_latent_gps(kernel, likelihood, output_dim)[source]

Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel.

Note: It’s not nice for GPModel to need to be aware of specific likelihoods as here. However, num_latent_gps is a bit more broken in general, we should fix this in the future. There are also some slightly problematic assumptions re the output dimensions of mean_function. See https://github.com/GPflow/GPflow/issues/1343

Parameters
• kernel (Kernel) –

• likelihood (Likelihood) –

• output_dim (int) –

Return type

int

static calc_num_latent_gps_from_data(data, kernel, likelihood)[source]

Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood.

Parameters
• kernel (Kernel) –

• likelihood (Likelihood) –

Return type

int

predict_f_samples(Xnew, num_samples=None, full_cov=True, full_output_cov=False)[source]

Produce samples from the posterior latent function(s) at the input points.

Parameters
• Xnew (Tensor) – InputData Input locations at which to draw samples, shape […, N, D] where N is the number of rows and D is the input dimension of each point.

• num_samples (Optional[int]) – Number of samples to draw. If None, a single sample is drawn and the return shape is […, N, P], for any positive integer the return shape contains an extra batch dimension, […, S, N, P], with S = num_samples and P is the number of outputs.

• full_cov (bool) – If True, draw correlated samples over the inputs. Computes the Cholesky over the dense covariance matrix of size [num_data, num_data]. If False, draw samples that are uncorrelated over the inputs.

• full_output_cov (bool) – If True, draw correlated samples over the outputs. If False, draw samples that are uncorrelated over the outputs.

Currently, the method does not support full_output_cov=True and full_cov=True.

Return type

Tensor

predict_log_density(data, full_cov=False, full_output_cov=False)[source]

Compute the log density of the data at the new data points.

Parameters
• data (Tuple[Tensor, Tensor]) –

• full_cov (bool) –

• full_output_cov (bool) –

Return type

Tensor

predict_y(Xnew, full_cov=False, full_output_cov=False)[source]

Compute the mean and variance of the held-out data at the input points.

Parameters
• Xnew (Tensor) –

• full_cov (bool) –

• full_output_cov (bool) –

Return type

Tuple[Tensor, Tensor]

## gpflow.models.GPR¶

class gpflow.models.GPR(data, kernel, mean_function=None, noise_variance=1.0)[source]

Bases: gpflow.models.model.GPModel, gpflow.models.training_mixins.InternalDataTrainingLossMixin

Gaussian Process Regression.

This is a vanilla implementation of GP regression with a Gaussian likelihood. Multiple columns of Y are treated independently.

The log likelihood of this model is given by

$\log p(Y \,|\, \mathbf f) = \mathcal N(Y \,|\, 0, \sigma_n^2 \mathbf{I})$

To train the model, we maximise the log _marginal_ likelihood w.r.t. the likelihood variance and kernel hyperparameters theta. The marginal likelihood is found by integrating the likelihood over the prior, and has the form

$\log p(Y \,|\, \sigma_n, \theta) = \mathcal N(Y \,|\, 0, \mathbf{K} + \sigma_n^2 \mathbf{I})$
Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. Computes the log marginal likelihood. log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f(Xnew[, full_cov, full_output_cov]) This method computes predictions at X in R^{N x D} input points predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method.
Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• mean_function (Optional[MeanFunction]) –

• noise_variance (float) –

_add_noise_cov(K)[source]

Returns K + σ² I, where σ² is the likelihood noise variance (scalar), and I is the corresponding identity matrix.

Parameters

K (Tensor) –

Return type

Tensor

log_marginal_likelihood()[source]

Computes the log marginal likelihood.

$\log p(Y | \theta).$
Return type

Tensor

maximum_log_likelihood_objective()[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]

This method computes predictions at X in R^{N x D} input points

$p(F* | Y)$

where F* are points on the GP at new data points, Y are noisy observations at training data points.

Parameters
• Xnew (Tensor) –

• full_cov (bool) –

• full_output_cov (bool) –

Return type

Tuple[Tensor, Tensor]

## gpflow.models.GPRFITC¶

class gpflow.models.GPRFITC(data, kernel, inducing_variable, *, mean_function=None, num_latent_gps=None, noise_variance=1.0)[source]

This implements GP regression with the FITC approximation. The key reference is

@inproceedings{Snelson06sparsegaussian,
author = {Edward Snelson and Zoubin Ghahramani},
title = {Sparse Gaussian Processes using Pseudo-inputs},
booktitle = {Advances In Neural Information Processing Systems},
year = {2006},
pages = {1257--1264},
publisher = {MIT press}
}


Implementation loosely based on code from GPML matlab library although obviously gradients are automatic in GPflow.

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. Construct a tensorflow function to compute the bound on the marginal likelihood. log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the latent function at some new points Xnew. predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method. upper_bound() Upper bound for the sparse GP regression marginal likelihood.
 common_terms
Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• inducing_variable (InducingPoints) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

• noise_variance (float) –

fitc_log_marginal_likelihood()[source]

Construct a tensorflow function to compute the bound on the marginal likelihood.

Return type

Tensor

maximum_log_likelihood_objective()[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]

Compute the mean and variance of the latent function at some new points Xnew.

Parameters

Xnew (Tensor) –

Return type

Tuple[Tensor, Tensor]

## gpflow.models.InternalDataTrainingLossMixin¶

class gpflow.models.InternalDataTrainingLossMixin[source]

Bases: object

Mixin utility for training loss methods for models that own their own data. It provides

See ExternalDataTrainingLossMixin for an equivalent mixin for models that do not own their own data.

Methods

 Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method.
training_loss()[source]

Returns the training loss for this model.

Return type

Tensor

training_loss_closure(*, compile=True)[source]

Convenience method. Returns a closure which itself returns the training loss. This closure can be passed to the minimize methods on gpflow.optimizers.Scipy and subclasses of tf.optimizers.Optimizer.

Parameters

compile – If True (default), compile the training loss function in a TensorFlow graph by wrapping it in tf.function()

Return type

Callable[[], Tensor]

## gpflow.models.SGPMC¶

class gpflow.models.SGPMC(data, kernel, likelihood, mean_function=None, num_latent_gps=None, inducing_variable=None)[source]

Bases: gpflow.models.model.GPModel, gpflow.models.training_mixins.InternalDataTrainingLossMixin

This is the Sparse Variational GP using MCMC (SGPMC). The key reference is

@inproceedings{hensman2015mcmc,
title={MCMC for Variatinoally Sparse Gaussian Processes},
author={Hensman, James and Matthews, Alexander G. de G.
and Filippone, Maurizio and Ghahramani, Zoubin},
booktitle={Proceedings of NIPS},
year={2015}
}


The latent function values are represented by centered (whitened) variables, so

\begin{align} \mathbf v & \sim N(0, \mathbf I) \\ \mathbf u &= \mathbf L\mathbf v \end{align}

with

$\mathbf L \mathbf L^\top = \mathbf K$
Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. This function computes the optimal density for v, q*(v), up to a constant This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f(Xnew[, full_cov, full_output_cov]) Xnew is a data matrix of the points at which we want to predict predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method.
Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

• inducing_variable (Optional[InducingPoints]) –

__init__(data, kernel, likelihood, mean_function=None, num_latent_gps=None, inducing_variable=None)[source]

data is a tuple of X, Y with X, a data matrix, size [N, D] and Y, a data matrix, size [N, R] Z is a data matrix, of inducing inputs, size [M, D] kernel, likelihood, mean_function are appropriate GPflow objects

Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

• inducing_variable (Optional[InducingPoints]) –

log_likelihood_lower_bound()[source]

This function computes the optimal density for v, q*(v), up to a constant

Return type

Tensor

log_posterior_density()[source]

This may be the posterior with respect to the hyperparameters (e.g. for GPR) or the posterior with respect to the function (e.g. for GPMC and SGPMC). It assumes that maximum_log_likelihood_objective() is defined sensibly.

Return type

Tensor

maximum_log_likelihood_objective()[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]

Xnew is a data matrix of the points at which we want to predict

This method computes

p(F* | (U=LV) )

where F* are points on the GP at Xnew, F=LV are points on the GP at Z,

Parameters

Xnew (Tensor) –

Return type

Tuple[Tensor, Tensor]

## gpflow.models.SGPR¶

class gpflow.models.SGPR(data, kernel, inducing_variable, *, mean_function=None, num_latent_gps=None, noise_variance=1.0)[source]

Sparse Variational GP regression. The key reference is

@inproceedings{titsias2009variational,
title={Variational learning of inducing variables in
sparse Gaussian processes},
author={Titsias, Michalis K},
booktitle={International Conference on
Artificial Intelligence and Statistics},
pages={567--574},
year={2009}
}

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. Computes the mean and variance of q(u) = N(mu, cov), the variational distribution on inducing outputs. Construct a tensorflow function to compute the bound on the marginal likelihood. log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the latent function at some new points Xnew. predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method. upper_bound() Upper bound for the sparse GP regression marginal likelihood.
Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• inducing_variable (InducingPoints) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

• noise_variance (float) –

compute_qu()[source]

Computes the mean and variance of q(u) = N(mu, cov), the variational distribution on inducing outputs. SVGP with this q(u) should predict identically to SGPR. :rtype: Tuple[Tensor, Tensor] :return: mu, cov

elbo()[source]

Construct a tensorflow function to compute the bound on the marginal likelihood. For a derivation of the terms in here, see the associated SGPR notebook.

Return type

Tensor

maximum_log_likelihood_objective()[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]

Compute the mean and variance of the latent function at some new points Xnew. For a derivation of the terms in here, see the associated SGPR notebook.

Parameters

Xnew (Tensor) –

Return type

Tuple[Tensor, Tensor]

## gpflow.models.SVGP¶

class gpflow.models.SVGP(kernel, likelihood, inducing_variable, *, mean_function=None, num_latent_gps=1, q_diag=False, q_mu=None, q_sqrt=None, whiten=True, num_data=None)[source]

Bases: gpflow.models.model.GPModel, gpflow.models.training_mixins.ExternalDataTrainingLossMixin

This is the Sparse Variational GP (SVGP). The key reference is

@inproceedings{hensman2014scalable,
title={Scalable Variational Gaussian Process Classification},
author={Hensman, James and Matthews, Alexander G. de G. and Ghahramani, Zoubin},
booktitle={Proceedings of AISTATS},
year={2015}
}

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. elbo(data) This gives a variational bound (the evidence lower bound or ELBO) on the log marginal likelihood of the model. log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss(data) Returns the training loss for this model. training_loss_closure(data, *[, compile]) Returns a closure that computes the training loss, which by default is wrapped in tf.function().
 predict_f prior_kl
Parameters
• num_latent_gps (int) –

• q_diag (bool) –

• whiten (bool) –

__init__(kernel, likelihood, inducing_variable, *, mean_function=None, num_latent_gps=1, q_diag=False, q_mu=None, q_sqrt=None, whiten=True, num_data=None)[source]
• kernel, likelihood, inducing_variables, mean_function are appropriate GPflow objects

• num_latent_gps is the number of latent processes to use, defaults to 1

• q_diag is a boolean. If True, the covariance is approximated by a diagonal matrix.

• whiten is a boolean. If True, we use the whitened representation of the inducing points.

• num_data is the total number of observations, defaults to X.shape[0] (relevant when feeding in external minibatches)

Parameters
• num_latent_gps (int) –

• q_diag (bool) –

• whiten (bool) –

_init_variational_parameters(num_inducing, q_mu, q_sqrt, q_diag)[source]

Constructs the mean and cholesky of the covariance of the variational Gaussian posterior. If a user passes values for q_mu and q_sqrt the routine checks if they have consistent and correct shapes. If a user does not specify any values for q_mu and q_sqrt, the routine initializes them, their shape depends on num_inducing and q_diag.

Note: most often the comments refer to the number of observations (=output dimensions) with P, number of latent GPs with L, and number of inducing points M. Typically P equals L, but when certain multioutput kernels are used, this can change.

Parameters
:param num_inducing: int

Number of inducing variables, typically refered to as M.

:param q_mu: np.array or None

Mean of the variational Gaussian posterior. If None the function will initialise the mean with zeros. If not None, the shape of q_mu is checked.

:param q_sqrt: np.array or None

Cholesky of the covariance of the variational Gaussian posterior. If None the function will initialise q_sqrt with identity matrix. If not None, the shape of q_sqrt is checked, depending on q_diag.

:param q_diag: bool

Used to check if q_mu and q_sqrt have the correct shape or to construct them with the correct shape. If q_diag is true, q_sqrt is two dimensional and only holds the square root of the covariance diagonal elements. If False, q_sqrt is three dimensional.

elbo(data)[source]

This gives a variational bound (the evidence lower bound or ELBO) on the log marginal likelihood of the model.

Parameters

data (Tuple[Tensor, Tensor]) –

Return type

Tensor

maximum_log_likelihood_objective(data)[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Parameters

data (Tuple[Tensor, Tensor]) –

Return type

Tensor

## gpflow.models.VGP¶

class gpflow.models.VGP(data, kernel, likelihood, mean_function=None, num_latent_gps=None)[source]

Bases: gpflow.models.model.GPModel, gpflow.models.training_mixins.InternalDataTrainingLossMixin

This method approximates the Gaussian process posterior using a multivariate Gaussian.

The idea is that the posterior over the function-value vector F is approximated by a Gaussian, and the KL divergence is minimised between the approximation and the posterior.

This implementation is equivalent to SVGP with X=Z, but is more efficient. The whitened representation is used to aid optimization.

The posterior approximation is

$q(\mathbf f) = N(\mathbf f \,|\, \boldsymbol \mu, \boldsymbol \Sigma)$
Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. This method computes the variational lower bound on the likelihood, which is: log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method.
 predict_f
Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

__init__(data, kernel, likelihood, mean_function=None, num_latent_gps=None)[source]

data = (X, Y) contains the input points [N, D] and the observations [N, P] kernel, likelihood, mean_function are appropriate GPflow objects

Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

elbo()[source]

This method computes the variational lower bound on the likelihood, which is:

E_{q(F)} [ log p(Y|F) ] - KL[ q(F) || p(F)]

with

q(mathbf f) = N(mathbf f ,|, boldsymbol mu, boldsymbol Sigma)

Return type

Tensor

maximum_log_likelihood_objective()[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

## gpflow.models.VGPOpperArchambeau¶

class gpflow.models.VGPOpperArchambeau(data, kernel, likelihood, mean_function=None, num_latent_gps=None)[source]

Bases: gpflow.models.model.GPModel, gpflow.models.training_mixins.InternalDataTrainingLossMixin

This method approximates the Gaussian process posterior using a multivariate Gaussian. The key reference is:

@article{Opper:2009,
title = {The Variational Gaussian Approximation Revisited},
author = {Opper, Manfred and Archambeau, Cedric},
journal = {Neural Comput.},
year = {2009},
pages = {786--792},
}


The idea is that the posterior over the function-value vector F is approximated by a Gaussian, and the KL divergence is minimised between the approximation and the posterior. It turns out that the optimal posterior precision shares off-diagonal elements with the prior, so only the diagonal elements of the precision need be adjusted. The posterior approximation is .. math:

q(\mathbf f) = N(\mathbf f \,|\, \mathbf K \boldsymbol \alpha,
[\mathbf K^{-1} + \textrm{diag}(\boldsymbol \lambda))^2]^{-1})


This approach has only 2ND parameters, rather than the N + N^2 of vgp, but the optimization is non-convex and in practice may cause difficulty.

Attributes
parameters
trainable_parameters

Methods

 __call__(*args, **kwargs) Call self as a function. calc_num_latent_gps(kernel, likelihood, …) Calculates the number of latent GPs required given the number of outputs output_dim and the type of likelihood and kernel. calc_num_latent_gps_from_data(data, kernel, …) Calculates the number of latent GPs required based on the data as well as the type of kernel and likelihood. q_alpha, q_lambda are variational parameters, size [N, R] This method computes the variational lower bound on the likelihood, which is: E_{q(F)} [ log p(Y|F) ] - KL[ q(F) || p(F)] with q(f) = N(f | K alpha + mean, [K^-1 + diag(square(lambda))]^-1) . log_posterior_density(*args, **kwargs) This may be the posterior with respect to the hyperparameters (e.g. log_prior_density() Sum of the log prior probability densities of all (constrained) variables in this model. Objective for maximum likelihood estimation. predict_f(Xnew[, full_cov, full_output_cov]) The posterior variance of F is given by predict_f_samples(Xnew[, num_samples, …]) Produce samples from the posterior latent function(s) at the input points. predict_log_density(data[, full_cov, …]) Compute the log density of the data at the new data points. predict_y(Xnew[, full_cov, full_output_cov]) Compute the mean and variance of the held-out data at the input points. training_loss() Returns the training loss for this model. training_loss_closure(*[, compile]) Convenience method.
Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

__init__(data, kernel, likelihood, mean_function=None, num_latent_gps=None)[source]

data = (X, Y) contains the input points [N, D] and the observations [N, P] kernel, likelihood, mean_function are appropriate GPflow objects

Parameters
• data (Tuple[Tensor, Tensor]) –

• kernel (Kernel) –

• likelihood (Likelihood) –

• mean_function (Optional[MeanFunction]) –

• num_latent_gps (Optional[int]) –

elbo()[source]

q_alpha, q_lambda are variational parameters, size [N, R] This method computes the variational lower bound on the likelihood, which is:

E_{q(F)} [ log p(Y|F) ] - KL[ q(F) || p(F)]

with

q(f) = N(f | K alpha + mean, [K^-1 + diag(square(lambda))]^-1) .

Return type

Tensor

maximum_log_likelihood_objective()[source]

Objective for maximum likelihood estimation. Should be maximized. E.g. log-marginal likelihood (hyperparameter likelihood) for GPR, or lower bound to the log-marginal likelihood (ELBO) for sparse and variational GPs.

Return type

Tensor

predict_f(Xnew, full_cov=False, full_output_cov=False)[source]
The posterior variance of F is given by

q(f) = N(f | K alpha + mean, [K^-1 + diag(lambda**2)]^-1)

Here we project this to F*, the values of the GP at Xnew which is given by

q(F*) = N ( F* | K_{F} alpha + mean, K_{*} - K_{*f}[K_{ff} +

diag(lambda**-2)]^-1 K_{f*} )

Note: This model currently does not allow full output covariances

Parameters
• Xnew (Tensor) –

• full_cov (bool) –

• full_output_cov (bool) –

Return type

Tuple[Tensor, Tensor]

## gpflow.models.maximum_log_likelihood_objective¶

gpflow.models.maximum_log_likelihood_objective(model, data)[source]
Parameters
• model (BayesianModel) –

• data (~Data) –

Return type

Tensor

## gpflow.models.training_loss¶

gpflow.models.training_loss(model, data)[source]
Parameters
• model (BayesianModel) –

• data (~Data) –

Return type

Tensor

## gpflow.models.training_loss_closure¶

gpflow.models.training_loss_closure(model, data, **closure_kwargs)[source]
Parameters
• model (BayesianModel) –

• data (~Data) –

Return type

Callable[[], Tensor]