Multi-output Gaussian processes in GPflow¶
This notebook shows how to construct a multi-output GP model using GPflow. We will consider a regression problem for functions \(f: \mathbb{R}^D \rightarrow \mathbb{R}^P\). We assume that the dataset is of the form \((X, f_1), \dots, (X, f_P)\), that is, we observe all the outputs for a particular input location (for cases where there are not fully observed outputs for each input, see A simple demonstration of coregionalisation).
Here we assume a model of the form:
Here we have two options for \(g\): 1. The output dimensions of \(g\) share the same kernel. 2. Each output of \(g\) has a separate kernel.
In addition, we have two further suboptions for the inducing inputs of \(g\): 1. The instances of \(g\) share the same inducing inputs. 2. Each output of \(g\) has its own set of inducing inputs.
The notation is as follows:
[1]:
import numpy as np
import matplotlib.pyplot as plt
import gpflow as gpf
import gpflow.multioutput.kernels as mk
import gpflow.multioutput.features as mf
%matplotlib inline
MAXITER = gpf.test_util.notebook_niter(15000)
Generate synthetic data¶
We create a utility function to generate synthetic data. We assume that:
[2]:
N = 100 # number of points
D = 1 # number of input dimensions
M = 20 # number of inducing points
L = 2 # number of latent GPs
P = 3 # number of observations = output dimensions
[3]:
def generate_data(N=100):
X = np.random.rand(N)[:, None] * 10 - 5 # Inputs = N x D
G = np.hstack((0.5 * np.sin(3 * X) + X, 3.0 * np.cos(X) - X)) # G = N x L
W = np.array([[0.5, -0.3, 1.5], [-0.4, 0.43, 0.0]]) # L x P
F = np.matmul(G, W) # N x P
Y = F + np.random.randn(*F.shape) * [0.2, 0.2, 0.2]
return X, Y
[4]:
X, Y = generate_data(N)
We create a utility function for plotting:
[5]:
def plot_model(m, lower=-6, upper=6):
pX = np.linspace(lower, upper, 100)[:, None]
pY, pYv = m.predict_y(pX)
if pY.ndim == 3:
pY = pY[:, 0, :]
plt.plot(m.X.value, m.Y.value, 'x')
plt.gca().set_prop_cycle(None)
plt.plot(pX, pY)
for i in range(pY.shape[1]):
top = pY[:, i] + 2.0 * pYv[:, i] ** 0.5
bot = pY[:, i] - 2.0 * pYv[:, i] ** 0.5
plt.fill_between(pX[:, 0], top, bot, alpha=0.3)
plt.xlabel('X')
plt.ylabel('f')
Model the outputs \(f(x)\) directly¶
The three examples below show how to model the outputs of the model \(f(x)\) directly. Mathematically, this case is equivalent to having:
3. Separate independent kernel and separate independent features¶
[14]:
gpf.reset_default_graph_and_session()
[15]:
# Create list of kernels for each output
kern_list = [gpf.kernels.RBF(D) + gpf.kernels.Linear(1) for _ in range(P)]
# Create multioutput kernel from kernel list
kernel = mk.SeparateIndependentMok(kern_list)
# initialisation of inducing input locations, one set of locations per output
Zs = [X[np.random.permutation(len(X))[:M],...].copy() for _ in range(P)]
# initialise as list inducing features
feature_list = [gpf.features.InducingPoints(Z) for Z in Zs]
# create multioutput features from feature_list
feature = mf.SeparateIndependentMof(feature_list)
NOTE: While the inducing points are independent, there needs to be the same number of inducing points per dimension.
[16]:
# create SVGP model as usual and optimise
m = gpf.models.SVGP(X, Y, kernel, gpf.likelihoods.Gaussian(), feat=feature)
opt = gpf.train.ScipyOptimizer()
opt.minimize(m, disp=True, maxiter=MAXITER)
INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Objective function value: 28.645250
Number of iterations: 13090
Number of functions evaluations: 13895
INFO:tensorflow:Optimization terminated with:
Message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
Objective function value: 28.645250
Number of iterations: 13090
Number of functions evaluations: 13895
[17]:
plot_model(m)
The plot below shows that we use different inducing inputs in each output dimension.
[18]:
for i in range(len(m.feature.feat_list)):
q_mu_unwhitened, q_var_unwhitened = m.predict_f(m.feature.feat_list[i].Z.value)
plt.plot(m.feature.feat_list[i].Z.value, q_mu_unwhitened[:, [i]], "o")
Model \(f(x)\) by doing inference in the \(g\) space¶
Illustration of GPflow’s multi-output capabilities¶
Multi-output kernels (MOK) class diagram¶
Multi-output features (MOF) class diagram¶
NOTE: MixedKernelSeparateMof is not implemented, but can easily be added to the framework.
The shape of Kuu and Kuf and the underlying conditional code depends on the Mof and Mok classes used.
Feature |
Kernel |
Kuu |
Kuf |
conditional |
note |
|---|---|---|---|---|---|
InducingPoints |
Mok |
MxPxMxP |
MxPxNxP |
fully_correlated_conditional |
This is the default. For certain kernels this is very inefficient. In this case q_mu and q_sqrt are 1 x MP and 1 x MP x MP |
SharedIndependentMof |
SharedIndependentMok |
M x M |
M x N |
base_conditional |
These two classes are in a sense redundant, because we can achieve the same behaviour by using the single output Kernel and InducingFeature classes. They are added for illustrative purposes. Thanks to the conditional dispatch, the most efficient code path is used. |
SeparateIndependentMof |
SharedIndependentMok |
P x M x M |
P x M x N |
P x base_conditional |
We loop P times over the base_conditional |
SharedIndependentMof |
SeparateIndependentMok |
P x M x M |
P x M x N |
P x base_conditional |
We loop P times over the base_conditional |
SeparateIndependentMof |
SeparateIndependentMok |
P x M x M |
P x M x N |
P x base_conditional |
We loop P times over the base_conditional |
SharedIndependentMof |
SeparateMixedKernel |
L x M x M |
M x L x N x P |
independent_interdomain_conditional |
Inducing outputs live in g-space |
SeparateIndependentMof |
SeparateMixedKernel |
L x M x M |
M x L x N x P |
independent_interdomain_conditional |
Very similar to the above |
MixedKernelSharedMof |
SeparateMixedKernel |
L x M x M |
L x M x N |
base_conditional |
This is the most efficient implementation for MixedKernels. The inducing outputs live in g-space. Here we use the output of the base conditional and project the mean and covariance with the mixing matrix W. |