{
“cells”: [
{

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“Discussion of the GP marginal likelihood upper boundn”, “–n”, “n”, “Mark van der Wilk, August 2017 n”, “Small edits by James Hensman 2017n”, “n”, “See [gp_upper](https://github.com/markvdw/gp_upper) for code to tighten the upper bound through optimisation, and a more comprehensive discussion.”

]

}, {

“cell_type”: “code”, “execution_count”: 9, “metadata”: {}, “outputs”: [], “source”: [

“import matplotlibn”, “%matplotlib inlinen”, “matplotlib.rcParams[‘figure.figsize’] = (12, 6)n”, “plt = matplotlib.pyplotn”, “n”, “import numpy as npn”, “import pandas as pdn”, “n”, “import gpflown”, “n”, “from FITCvsVFE import getTrainingTestDatan”, “from __future__ import print_function”

]

}, {

“cell_type”: “code”, “execution_count”: 10, “metadata”: {}, “outputs”: [], “source”: [

“X, Y, Xt, Yt = getTrainingTestData()”

]

}, {

“cell_type”: “code”, “execution_count”: 11, “metadata”: {

“collapsed”: true

}, “outputs”: [], “source”: [

“def plot_model(m, name=”“):n”, ” pX = np.linspace(-3, 9, 100)[:, None]n”, ” pY, pYv = m.predict_y(pX)n”, ” plt.plot(X, Y, ‘x’)n”, ” plt.plot(pX, pY)n”, ” try:n”,
<<<<<<< HEAD
” plt.plot(m.feature.value, m.feature.Z.value * 0, ‘o’)n”,
” except AttributeError:n”, ” passn”, ” two_sigma = (2.0 * pYv ** 0.5)[:, 0]n”, ” plt.fill_between(pX[:, 0], pY[:, 0] - two_sigma, pY[:, 0] + two_sigma, alpha=0.15)n”, ” lml = m.compute_log_likelihood()n”, ” plt.title(“%s (lml = %f)” % (name, lml))n”, ” return lml”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“## Full model”

]

}, {

“cell_type”: “code”, “execution_count”: 12, “metadata”: {}, “outputs”: [

{

“name”: “stdout”, “output_type”: “stream”, “text”: [

INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL’n”, ” Objective function value: 23.966658n”, ” Number of iterations: 13n”, ” Number of functions evaluations: 18n”

]

}, {

“data”: {

“image/png”: “”, “text/plain”: [

“<matplotlib.figure.Figure at 0x1320b0c18>”

]

}, “metadata”: {}, “output_type”: “display_data”

}

], “source”: [

“f = gpflow.models.GPR(X, Y, gpflow.kernels.RBF(1))n”, “f.compile()n”, “gpflow.train.ScipyOptimizer().minimize(f)n”, “full_lml = plot_model(f)”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“## Upper bounds for sparse variational modelsn”, “As a first investigation, we compute the upper bound for models trained using the sparse variational GP approximation.”

]

}, {

“cell_type”: “code”, “execution_count”: 13, “metadata”: {}, “outputs”: [

{

“name”: “stdout”, “output_type”: “stream”, “text”: [

INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 49.220787n”, ” Number of iterations: 25n”, ” Number of functions evaluations: 36n”, “4 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 40.211568n”, ” Number of iterations: 52n”, ” Number of functions evaluations: 79n”, “5 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 37.583478n”, ” Number of iterations: 35n”, ” Number of functions evaluations: 43n”, “6 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 30.092951n”, ” Number of iterations: 32n”, ” Number of functions evaluations: 49n”, “7 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 26.316172n”, ” Number of iterations: 33n”, ” Number of functions evaluations: 45n”, “8 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 24.957100n”, ” Number of iterations: 30n”, ” Number of functions evaluations: 36n”, “9 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 24.310472n”, ” Number of iterations: 35n”, ” Number of functions evaluations: 41n”, “10 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 24.069186n”, ” Number of iterations: 45n”, ” Number of functions evaluations: 63n”, “11 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 24.014035n”, ” Number of iterations: 64n”, ” Number of functions evaluations: 81n”, “12 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 23.981403n”, ” Number of iterations: 58n”, ” Number of functions evaluations: 72n”, “13 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 23.979629n”, ” Number of iterations: 76n”, ” Number of functions evaluations: 92n”, “14 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 23.979570n”, ” Number of iterations: 67n”, ” Number of functions evaluations: 76n”, “15 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 23.966980n”, ” Number of iterations: 175n”, ” Number of functions evaluations: 209n”, “16 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 23.966979n”, ” Number of iterations: 236n”, ” Number of functions evaluations: 278n”, “17 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 23.966959n”, ” Number of iterations: 163n”, ” Number of functions evaluations: 193n”, “18 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 23.966942n”, ” Number of iterations: 127n”, ” Number of functions evaluations: 158n”, “19 “

]

}

], “source”: [

“Ms = np.arange(4, 20, 1)n”, “vfe_lml = []n”, “vupper_lml = []n”, “vfe_hyps = []n”, “for M in Ms:n”, ” Zinit = X[:M, :].copy()n”, ” vfe = gpflow.models.SGPR(X, Y, gpflow.kernels.RBF(1), Zinit)n”, ” vfe.compile()n”, ” gpflow.train.ScipyOptimizer().minimize(vfe, disp=False)n”, ” n”, ” vfe_lml.append(vfe.compute_log_likelihood())n”, ” vupper_lml.append(vfe.compute_upper_bound())n”, ” vfe_hyps.append({p.full_name:p.read_value() for p in vfe.trainable_parameters})n”, ” print(“%i” % M, end=” “)n”, “vfe_hyps = pd.DataFrame(vfe_hyps)”

]

}, {

“cell_type”: “code”, “execution_count”: 14, “metadata”: {}, “outputs”: [

{
“data”: {
“text/plain”: [
“<matplotlib.text.Text at 0x13d2b2668>”

]

}, “execution_count”: 14, “metadata”: {}, “output_type”: “execute_result”

}, {

“data”: {

“image/png”: “n”, “text/plain”: [

“<matplotlib.figure.Figure at 0x12fd50710>”

]

}, “metadata”: {}, “output_type”: “display_data”

}

], “source”: [

“plt.plot(Ms, vfe_lml, label=”lower”)n”, “plt.plot(Ms, vupper_lml, label=”upper”)n”, “plt.axhline(full_lml, label=”full”, alpha=0.3)n”, “plt.xlabel(“Number of inducing points”)n”, “plt.ylabel(“LML estimate”)n”, “plt.legend()n”, “plt.title(“LML bounds for models trained with SGPR”)”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“We see that the lower bound increases as more inducing points are added. Note that the upper bound does _not_ monotonically decrease! This is because as we train the sparse model, we also get better estimates of the hyperparameters. The upper bound will be different for this different setting of the hyperparameters, and is sometimes looser. The upper bound also converges to the true lml slower than the lower bound.”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“## Upper bounds for fixed hyperparametersn”, “Here, we train sparse models with the hyperparameters fixed to the optimal value found previously.”

]

}, {

“cell_type”: “code”, “execution_count”: 16, “metadata”: {}, “outputs”: [

{

“name”: “stdout”, “output_type”: “stream”, “text”: [

INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 62.680716n”, ” Number of iterations: 9n”, ” Number of functions evaluations: 16n”, “3 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 60.882546n”, ” Number of iterations: 13n”, ” Number of functions evaluations: 17n”, “4 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL’n”, ” Objective function value: 59.697017n”, ” Number of iterations: 9n”, ” Number of functions evaluations: 11n”, “5 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 59.065631n”, ” Number of iterations: 10n”, ” Number of functions evaluations: 11n”, “6 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 59.053159n”, ” Number of iterations: 12n”, ” Number of functions evaluations: 14n”, “7 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 59.023861n”, ” Number of iterations: 19n”, ” Number of functions evaluations: 22n”, “8 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 59.023374n”, ” Number of iterations: 21n”, ” Number of functions evaluations: 24n”, “9 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 59.023338n”, ” Number of iterations: 20n”, ” Number of functions evaluations: 24n”, “10 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 59.023335n”, ” Number of iterations: 2n”, ” Number of functions evaluations: 5n”, “11 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 59.023334n”, ” Number of iterations: 1n”, ” Number of functions evaluations: 3n”, “12 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL’n”, ” Objective function value: 59.023330n”, ” Number of iterations: 5n”, ” Number of functions evaluations: 8n”, “13 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL’n”, ” Objective function value: 59.023329n”, ” Number of iterations: 1n”, ” Number of functions evaluations: 3n”, “14 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL’n”, ” Objective function value: 59.023328n”, ” Number of iterations: 1n”, ” Number of functions evaluations: 4n”, “15 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 59.023326n”, ” Number of iterations: 1n”, ” Number of functions evaluations: 3n”, “16 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL’n”, ” Objective function value: 59.023322n”, ” Number of iterations: 3n”, ” Number of functions evaluations: 6n”, “17 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL’n”, ” Objective function value: 59.023321n”, ” Number of iterations: 0n”, ” Number of functions evaluations: 1n”, “18 INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL’n”, ” Objective function value: 59.023321n”, ” Number of iterations: 0n”, ” Number of functions evaluations: 1n”, “19 “

]

}

], “source”: [

“fMs = np.arange(3, 20, 1)n”, “fvfe_lml = [] # Fixed vfe lmln”, “fvupper_lml = [] # Fixed upper lmln”, “for M in fMs:n”, ” Zinit = vfe.feature.Z.read_value()[:M, :].copy()n”, ” Zinit = np.vstack((Zinit, X[np.random.permutation(len(X))[:(M - len(Zinit))], :].copy()))n”, ” init_params = {p.full_name:p.read_value() for p in vfe.parameters} # TODO: provide convenience functionn”, ” init_params[‘SGPR/feature/Z’] = Zinitn”, ” vfe = gpflow.models.SGPR(X, Y, gpflow.kernels.RBF(1), Zinit)n”, ” vfe.assign(init_params)n”, ” vfe.kern.variance.set_trainable(False)n”, ” vfe.kern.lengthscales.set_trainable(False)n”, ” vfe.likelihood.set_trainable(False)n”, ” vfe.compile()n”, ” gpflow.train.ScipyOptimizer().minimize(vfe, disp=False)n”, ” n”, ” fvfe_lml.append(vfe.compute_log_likelihood())n”, ” fvupper_lml.append(vfe.compute_upper_bound())n”, ” print(“%i” % M, end=” “)”

]

}, {

“cell_type”: “code”, “execution_count”: 17, “metadata”: {}, “outputs”: [

{
“data”: {
“text/plain”: [
“<matplotlib.legend.Legend at 0x146fe2f28>”

]

}, “execution_count”: 17, “metadata”: {}, “output_type”: “execute_result”

}, {

“data”: {

“image/png”: “n”, “text/plain”: [

“<matplotlib.figure.Figure at 0x14703e668>”

]

}, “metadata”: {}, “output_type”: “display_data”

}

], “source”: [

“plt.plot(fMs, fvfe_lml, label=”lower”)n”, “plt.plot(fMs, fvupper_lml, label=”upper”)n”, “plt.axhline(full_lml, label=”full”, alpha=0.3)n”, “plt.xlabel(“Number of inducing points”)n”, “plt.ylabel(“LML estimate”)n”, “plt.legend()”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“Now, as the hyperparameters are fixed, the bound _does_ monotonically decrease. We chose the optimal hyperparameters here, but the picture should be the same for any hyperparameter setting. This shows that we increasingly get a better estimate of the marginal likelihood as we add more inducing points.”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“## A tight estimate bound does not imply a converged model”

]

}, {

“cell_type”: “code”, “execution_count”: 18, “metadata”: {}, “outputs”: [

{

“name”: “stdout”, “output_type”: “stream”, “text”: [

INFO:tensorflow:Optimization terminated with:n”, ” Message: b’CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH’n”, ” Objective function value: 62.487090n”, ” Number of iterations: 46n”, ” Number of functions evaluations: 57n”, “Lower bound: -62.487090n”, “Upper bound: -62.484641n”

]

}

], “source”: [

“vfe = gpflow.models.SGPR(X, Y, gpflow.kernels.RBF(1), X[None, 0, :].copy())n”, “vfe.compile()n”, “gpflow.train.ScipyOptimizer().minimize(vfe)n”, “print(“Lower bound: %f” % vfe.compute_log_likelihood())n”, “print(“Upper bound: %f” % vfe.compute_upper_bound())”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“In this case we show that for the hyperparameter setting, the bound is very tight. However, this does _not_ imply that we have enough inducing points, but simply that we have correctly identified the marginal likelihood for this particular hyperparameter setting. In this specific case, where we used a single inducing point, the model collapses to not using the GP at all (lengthscale is really long to model only the mean). The rest of the variance is explained by noise. This GP can be perfectly approximated with a single inducing point.”

]

}, {

“cell_type”: “code”, “execution_count”: 19, “metadata”: {}, “outputs”: [

{
“data”: {
“text/html”: [
“<div>n”, “<style>n”, ” .dataframe thead tr:only-child th {n”, ” text-align: right;n”, ” }n”, “n”, ” .dataframe thead th {n”, ” text-align: left;n”, ” }n”, “n”, ” .dataframe tbody tr th {n”, ” vertical-align: top;n”, ” }n”, “</style>n”, “<table border=”1” class=”dataframe”>n”, ” <thead>n”, ” <tr style=”text-align: right;”>n”, ” <th></th>n”, ” <th>class</th>n”, ” <th>prior</th>n”, ” <th>transform</th>n”, ” <th>trainable</th>n”, ” <th>shape</th>n”, ” <th>fixed_shape</th>n”, ” <th>value</th>n”, ” </tr>n”, ” </thead>n”, ” <tbody>n”, ” <tr>n”, ” <th>SGPR/kern/variance</th>n”, ” <td>Parameter</td>n”, ” <td>None</td>n”, ” <td>+ve</td>n”, ” <td>True</td>n”, ” <td>()</td>n”, ” <td>True</td>n”, ” <td>0.10775260845160224</td>n”, ” </tr>n”, ” <tr>n”, ” <th>SGPR/kern/lengthscales</th>n”, ” <td>Parameter</td>n”, ” <td>None</td>n”, ” <td>+ve</td>n”, ” <td>True</td>n”, ” <td>()</td>n”, ” <td>True</td>n”, ” <td>1014.3847358198036</td>n”, ” </tr>n”, ” <tr>n”, ” <th>SGPR/likelihood/variance</th>n”, ” <td>Parameter</td>n”, ” <td>None</td>n”, ” <td>+ve</td>n”, ” <td>True</td>n”, ” <td>()</td>n”, ” <td>True</td>n”, ” <td>0.682431983343512</td>n”, ” </tr>n”, ” <tr>n”, ” <th>SGPR/feature/Z</th>n”, ” <td>Parameter</td>n”, ” <td>None</td>n”, ” <td>(none)</td>n”, ” <td>True</td>n”, ” <td>(1, 1)</td>n”, ” <td>True</td>n”, ” <td>[[2.65740317504]]</td>n”, ” </tr>n”, ” </tbody>n”, “</table>n”, “</div>”

], “text/plain”: [

” class prior transform trainable shape \n”, “SGPR/kern/variance Parameter None +ve True () n”, “SGPR/kern/lengthscales Parameter None +ve True () n”, “SGPR/likelihood/variance Parameter None +ve True () n”, “SGPR/feature/Z Parameter None (none) True (1, 1) n”, “n”, ” fixed_shape value n”, “SGPR/kern/variance True 0.10775260845160224 n”, “SGPR/kern/lengthscales True 1014.3847358198036 n”, “SGPR/likelihood/variance True 0.682431983343512 n”, “SGPR/feature/Z True [[2.65740317504]] “

]

}, “execution_count”: 19, “metadata”: {}, “output_type”: “execute_result”

}

], “source”: [

“vfe.as_pandas_table()”

]

}, {

“cell_type”: “markdown”, “metadata”: {}, “source”: [

“This can be diagnosed by showing that there are other hyperparameter settings with higher upper bounds. This indicates that there may be better hyperparameter settings, but we cannot identify them due to the lack of inducing points. An example of this can be seen in the previous section.”

]

}

], “metadata”: {

“kernelspec”: {
“display_name”: “Python [default]”, “language”: “python”, “name”: “python3”

}, “language_info”: {

“codemirror_mode”: {
“name”: “ipython”, “version”: 3

}, “file_extension”: “.py”, “mimetype”: “text/x-python”, “name”: “python”, “nbconvert_exporter”: “python”, “pygments_lexer”: “ipython3”, “version”: “3.6.0”

}

}, “nbformat”: 4, “nbformat_minor”: 1

}