gpflow.optimizers

gpflow.optimizers.NaturalGradient

class gpflow.optimizers.NaturalGradient(gamma, xi_transform=<gpflow.optimizers.natgrad.XiNat object>, name=None)[source]

Bases: tensorflow.optimizers.Optimizer

Implements a natural gradient descent optimizer for variational models that are based on a distribution q(u) = N(q_mu, q_sqrt q_sqrtᵀ) that is parameterized by mean q_mu and lower-triangular Cholesky factor q_sqrt of the covariance.

Note that this optimizer does not implement the standard API of tf.optimizers.Optimizer. Its only public method is minimize(), which has a custom signature (var_list needs to be a list of (q_mu, q_sqrt) tuples, where q_mu and q_sqrt are gpflow.Parameter instances, not tf.Variable).

When using in your work, please cite

@inproceedings{salimbeni18,

title={Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models}, author={Salimbeni, Hugh and Eleftheriadis, Stefanos and Hensman, James}, booktitle={AISTATS}, year={2018}

Methods

__call__(self, \*args, \*\*kw)

Call self as a function.

minimize(self, loss_fn, tensorflow.Tensor], …)

Minimizes objective function of the model.

get_config

Parameters
  • gamma (Union[float, tensorflow.Tensor, ndarray]) –

  • xi_transform (XiTransform) –

__init__(self, gamma: Union[float, tensorflow.Tensor, numpy.ndarray], xi_transform: gpflow.optimizers.natgrad.XiTransform = <gpflow.optimizers.natgrad.XiNat object at 0x7f8c8b4752e8>, name=None)[source]
Parameters
  • gamma (Union[float, tensorflow.Tensor, ndarray]) – natgrad step length

  • xi_transform (XiTransform) – default ξ transform (can be overridden in the call to minimize()) The XiNat default choice works well in general.

_natgrad_apply_gradients(self, q_mu_grad: tensorflow.Tensor, q_sqrt_grad: tensorflow.Tensor, q_mu: gpflow.base.Parameter, q_sqrt: gpflow.base.Parameter, xi_transform: Union[gpflow.optimizers.natgrad.XiTransform, NoneType] = None)[source]

This function does the backward step on the q_mu and q_sqrt parameters, given the gradients of the loss function with respect to their unconstrained variables. I.e., it expects the arguments to come from

with tf.GradientTape() as tape:

loss = loss_function()

q_mu_grad, q_mu_sqrt = tape.gradient(loss, [q_mu, q_sqrt])

(Note that tape.gradient() returns the gradients in unconstrained space!)

Implements equation [10] from

@inproceedings{salimbeni18,

title={Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models}, author={Salimbeni, Hugh and Eleftheriadis, Stefanos and Hensman, James}, booktitle={AISTATS}, year={2018}

In addition, for convenience with the rest of GPflow, this code computes ∂L/∂η using the chain rule (the following assumes a numerator layout where the gradient is a row vector; note that TensorFlow actually returns a column vector), where L is the loss:

∂L/∂η = (∂L / ∂[q_mu, q_sqrt])(∂[q_mu, q_sqrt] / ∂η)

In total there are three derivative calculations: natgrad of L w.r.t ξ = (∂ξ / ∂θ) [(∂L / ∂[q_mu, q_sqrt]) (∂[q_mu, q_sqrt] / ∂η)]ᵀ

Note that if ξ = θ (i.e. [q_mu, q_sqrt]) some of these calculations are the identity. In the code η = eta, ξ = xi, θ = nat.

Parameters
  • q_mu_grad (tensorflow.Tensor) – gradient of loss w.r.t. q_mu (in unconstrained space)

  • q_sqrt_grad (tensorflow.Tensor) – gradient of loss w.r.t. q_sqrt (in unconstrained space)

  • q_mu (Parameter) – parameter for the mean of q(u)

  • q_sqrt (Parameter) – parameter for the square root of the covariance of q(u)

  • xi_transform (Optional[XiTransform]) – the ξ transform to use (self.xi_transform if not specified)

_natgrad_steps(self, loss_fn: Callable[[], tensorflow.Tensor], parameters: Sequence[Tuple[gpflow.base.Parameter, gpflow.base.Parameter, Union[gpflow.optimizers.natgrad.XiTransform, NoneType]]])[source]

Computes gradients of loss_fn() w.r.t. q_mu and q_sqrt, and updates these parameters using the natgrad backwards step, for all sets of variational parameters passed in.

Parameters
  • loss_fn (Callable[[], tensorflow.Tensor]) – Loss function.

  • parameters (Sequence[Tuple[Parameter, Parameter, Optional[XiTransform]]]) – List of tuples (q_mu, q_sqrt, xi_transform)

minimize(self, loss_fn: Callable[[], tensorflow.Tensor], var_list: Sequence[Union[Tuple[gpflow.base.Parameter, gpflow.base.Parameter], Tuple[gpflow.base.Parameter, gpflow.base.Parameter, ForwardRef('XiTransform')]]])[source]

Minimizes objective function of the model. Natural Gradient optimizer works with variational parameters only.

Parameters
  • loss_fn (Callable[[], tensorflow.Tensor]) – Loss function.

  • var_list (Sequence[Union[Tuple[Parameter, Parameter], Tuple[Parameter, Parameter, XiTransform]]]) –

    List of pair tuples of variational parameters or triplet tuple with variational parameters and ξ transformation. If ξ is not specified, will use self.xi_transform. For example, var_list could be ``` var_list = [

    (q_mu1, q_sqrt1), (q_mu2, q_sqrt2, XiSqrtMeanVar())

GPflow implements the XiNat (default) and XiSqrtMeanVar transformations for parameters. Custom transformations that implement the XiTransform interface are also possible.

gpflow.optimizers.SamplingHelper

class gpflow.optimizers.SamplingHelper(target_log_prob_fn, parameters)[source]

Bases: object

This helper makes it easy to read from variables being set with a prior and writes values back to the same variables.

Example:

model = … # Create a GPflow model hmc_helper = SamplingHelper(model.log_posterior_density, model.trainable_parameters)

target_log_prob_fn = hmc_helper.target_log_prob_fn current_state = hmc_helper.current_state

hmc = tfp.mcmc.HamiltonianMonteCarlo(target_log_prob_fn=target_log_prob_fn, …) adaptive_hmc = tfp.mcmc.SimpleStepSizeAdaptation(hmc, …)

@tf.function def run_chain_fn():

return mcmc.sample_chain(

num_samples, num_burnin_steps, current_state, kernel=adaptive_hmc)

hmc_samples = run_chain_fn() parameter_samples = hmc_helper.convert_to_constrained_values(hmc_samples)

Attributes
current_state

Return the current state of the unconstrained variables, used in HMC.

target_log_prob_fn

The target log probability, adjusted to allow for optimisation to occur on the tracked unconstrained underlying variables.

Methods

convert_to_constrained_values(self, hmc_samples)

Converts list of unconstrained values in hmc_samples to constrained versions.

Parameters
  • target_log_prob_fn (Callable[[], tensorflow.Tensor]) –

  • parameters (Sequence[Parameter]) –

__init__(self, target_log_prob_fn: Callable[[], tensorflow.Tensor], parameters: Sequence[gpflow.base.Parameter])[source]
Parameters
  • target_log_prob_fn (Callable[[], tensorflow.Tensor]) – a callable which returns the log-density of the model under the target distribution; needs to implicitly depend on the parameters. E.g. model.log_posterior_density.

  • parameters (Sequence[Parameter]) – List of gpflow.Parameter used as a state of the Markov chain. E.g. model.trainable_parameters Note that each parameter must have been given a prior.

convert_to_constrained_values(self, hmc_samples)[source]

Converts list of unconstrained values in hmc_samples to constrained versions. Each value in the list corresponds to an entry in parameters passed to the constructor; for parameters that have a transform, the constrained representation is returned.

property current_state

Return the current state of the unconstrained variables, used in HMC.

property target_log_prob_fn

The target log probability, adjusted to allow for optimisation to occur on the tracked unconstrained underlying variables.

gpflow.optimizers.Scipy

class gpflow.optimizers.Scipy[source]

Bases: object

Methods

minimize(self, closure, tensorflow.Tensor], …)

Minimize is a wrapper around the scipy.optimize.minimize function handling the packing and unpacking of a list of shaped variables on the TensorFlow side vs.

assign_tensors

callback_func

eval_func

initial_parameters

pack_tensors

unpack_tensors

minimize(self, closure: Callable[[], tensorflow.Tensor], variables: Sequence[tensorflow.Variable], method: Union[str, NoneType] = 'L-BFGS-B', step_callback: Union[Callable[[int, Sequence[tensorflow.Variable], Sequence[tensorflow.Tensor]], NoneType], NoneType] = None, compile: bool = True, **scipy_kwargs) → scipy.optimize.optimize.OptimizeResult[source]

Minimize is a wrapper around the scipy.optimize.minimize function handling the packing and unpacking of a list of shaped variables on the TensorFlow side vs. the flat numpy array required on the Scipy side.

Args:
closure: A closure that re-evaluates the model, returning the loss

to be minimized.

variables: The list (tuple) of variables to be optimized

(typically model.trainable_variables)

method: The type of solver to use in SciPy. Defaults to “L-BFGS-B”. step_callback: If not None, a callable that gets called once after

each optimisation step. The callable is passed the arguments step, variables, and values. step is the optimisation step counter, variables is the list of trainable variables as above, and values is the corresponding list of tensors of matching shape that contains their value at this optimisation step.

compile: If True, wraps the evaluation function (the passed closure

as well as its gradient computation) inside a tf.function(), which will improve optimization speed in most cases.

scipy_kwargs: Arguments passed through to scipy.optimize.minimize

Note that Scipy’s minimize() takes a callback argument, but you probably want to use our wrapper and pass in step_callback.

Returns:

The optimization result represented as a Scipy OptimizeResult object. See the Scipy documentation for description of attributes.

Parameters
  • closure (Callable[[], tensorflow.Tensor]) –

  • variables (Sequence[tensorflow.Variable]) –

  • method (Optional[str]) –

  • step_callback (Optional[Callable[[int, Sequence[tensorflow.Variable], Sequence[tensorflow.Tensor]], None]]) –

  • compile (bool) –

Return type

OptimizeResult

gpflow.optimizers.XiNat

class gpflow.optimizers.XiNat[source]

Bases: gpflow.optimizers.natgrad.XiTransform

This is the default transform. Using the natural directly saves the forward mode gradient, and also gives the analytic optimal solution for gamma=1 in the case of Gaussian likelihood.

Methods

meanvarsqrt_to_xi(mean, varsqrt)

Transforms the parameter mean and varsqrt to xi1, xi2

naturals_to_xi(nat1, nat2)

Applies the transform so that nat1, nat2 is mapped to xi1, xi2

xi_to_meanvarsqrt(xi1, xi2)

Transforms the parameter xi1, xi2 to mean, varsqrt

static meanvarsqrt_to_xi(mean, varsqrt)[source]

Transforms the parameter mean and varsqrt to xi1, xi2

Parameters
  • mean – the mean parameter (N, D)

  • varsqrt – the varsqrt parameter (D, N, N)

Returns

tuple (xi1, xi2), the xi parameters (N, D), (D, N, N)

static naturals_to_xi(nat1, nat2)[source]

Applies the transform so that nat1, nat2 is mapped to xi1, xi2

Parameters
  • nat1 – the θ₁ parameter

  • nat2 – the θ₂ parameter

Returns

tuple xi1, xi2

static xi_to_meanvarsqrt(xi1, xi2)[source]

Transforms the parameter xi1, xi2 to mean, varsqrt

Parameters
  • xi1 – the ξ₁ parameter

  • xi2 – the ξ₂ parameter

Returns

tuple (mean, varsqrt), the meanvarsqrt parameters

gpflow.optimizers.XiSqrtMeanVar

class gpflow.optimizers.XiSqrtMeanVar[source]

Bases: gpflow.optimizers.natgrad.XiTransform

This transformation will perform natural gradient descent on the model parameters, so saves the conversion to and from Xi.

Methods

meanvarsqrt_to_xi(mean, varsqrt)

Transforms the parameter mean and varsqrt to xi1, xi2

naturals_to_xi(nat1, nat2)

Applies the transform so that nat1, nat2 is mapped to xi1, xi2

xi_to_meanvarsqrt(xi1, xi2)

Transforms the parameter xi1, xi2 to mean, varsqrt

static meanvarsqrt_to_xi(mean, varsqrt)[source]

Transforms the parameter mean and varsqrt to xi1, xi2

Parameters
  • mean – the mean parameter (N, D)

  • varsqrt – the varsqrt parameter (D, N, N)

Returns

tuple (xi1, xi2), the xi parameters (N, D), (D, N, N)

static naturals_to_xi(nat1, nat2)[source]

Applies the transform so that nat1, nat2 is mapped to xi1, xi2

Parameters
  • nat1 – the θ₁ parameter

  • nat2 – the θ₂ parameter

Returns

tuple xi1, xi2

static xi_to_meanvarsqrt(xi1, xi2)[source]

Transforms the parameter xi1, xi2 to mean, varsqrt

Parameters
  • xi1 – the ξ₁ parameter

  • xi2 – the ξ₂ parameter

Returns

tuple (mean, varsqrt), the meanvarsqrt parameters

gpflow.optimizers.XiTransform

class gpflow.optimizers.XiTransform[source]

Bases: object

XiTransform is the base class that implements three transformations necessary for the natural gradient calculation wrt any parameterization. This class does not handle any shape information, but it is assumed that the parameters pairs are always of shape (N, D) and (D, N, N).

Methods

meanvarsqrt_to_xi(mean, varsqrt)

Transforms the parameter mean and varsqrt to xi1, xi2

naturals_to_xi(nat1, nat2)

Applies the transform so that nat1, nat2 is mapped to xi1, xi2

xi_to_meanvarsqrt(xi1, xi2)

Transforms the parameter xi1, xi2 to mean, varsqrt

abstract static meanvarsqrt_to_xi(mean, varsqrt)[source]

Transforms the parameter mean and varsqrt to xi1, xi2

Parameters
  • mean – the mean parameter (N, D)

  • varsqrt – the varsqrt parameter (D, N, N)

Returns

tuple (xi1, xi2), the xi parameters (N, D), (D, N, N)

abstract static naturals_to_xi(nat1, nat2)[source]

Applies the transform so that nat1, nat2 is mapped to xi1, xi2

Parameters
  • nat1 – the θ₁ parameter

  • nat2 – the θ₂ parameter

Returns

tuple xi1, xi2

abstract static xi_to_meanvarsqrt(xi1, xi2)[source]

Transforms the parameter xi1, xi2 to mean, varsqrt

Parameters
  • xi1 – the ξ₁ parameter

  • xi2 – the ξ₂ parameter

Returns

tuple (mean, varsqrt), the meanvarsqrt parameters

gpflow.optimizers.natgrad