Multiple-kernel ridge path between two kernels

This example demonstrates the path of all possible ratios of kernel weights between two kernels, in a multiple kernel ridge regression model. Over the path of ratios, the kernels are weighted by the kernel weights, then summed, and a joint model is fit on the obtained kernel. The explained variance on a test set is then computed, and decomposed over both kernels.

from functools import partial

import numpy as np
import matplotlib.pyplot as plt

from himalaya.backend import set_backend
from himalaya.kernel_ridge import MultipleKernelRidgeCV
from himalaya.kernel_ridge import Kernelizer
from himalaya.kernel_ridge import ColumnKernelizer
from himalaya.progress_bar import bar
from himalaya.utils import generate_multikernel_dataset

from sklearn.pipeline import make_pipeline
from sklearn import set_config
set_config(display='diagram')

In this example, we use the cupy backend.

backend = set_backend("cupy", on_error="warn")

Generate a random dataset

  • X_train : array of shape (n_samples_train, n_features)

  • X_test : array of shape (n_samples_test, n_features)

  • Y_train : array of shape (n_samples_train, n_targets)

  • Y_test : array of shape (n_samples_test, n_targets)

n_targets = 50
kernel_weights = np.tile(np.array([0.6, 0.4])[None], (n_targets, 1))

(X_train, X_test, Y_train, Y_test,
 kernel_weights, n_features_list) = generate_multikernel_dataset(
     n_kernels=2, n_targets=n_targets, n_samples_train=600,
     n_samples_test=300, random_state=42, noise=0.31,
     kernel_weights=kernel_weights)

feature_names = [f"Feature space {ii}" for ii in range(len(n_features_list))]

Create a MultipleKernelRidgeCV model, see plot_mkr_sklearn_api.py for more details.

# Find the start and end of each feature space X in Xs.
start_and_end = np.concatenate([[0], np.cumsum(n_features_list)])
slices = [
    slice(start, end)
    for start, end in zip(start_and_end[:-1], start_and_end[1:])
]

# Create a different ``Kernelizer`` for each feature space.
kernelizers = [(name, Kernelizer(), slice_)
               for name, slice_ in zip(feature_names, slices)]
column_kernelizer = ColumnKernelizer(kernelizers)

# Create a MultipleKernelRidgeCV model.
solver_params = dict(alphas=np.logspace(-5, 5, 41), progress_bar=False)
model = MultipleKernelRidgeCV(kernels="precomputed", solver="random_search",
                              solver_params=solver_params,
                              random_state=42)
pipe = make_pipeline(column_kernelizer, model)
pipe
Pipeline(steps=[('columnkernelizer',
                 ColumnKernelizer(transformers=[('Feature space 0',
                                                 Kernelizer(),
                                                 slice(0, 1000, None)),
                                                ('Feature space 1',
                                                 Kernelizer(),
                                                 slice(1000, 2000, None))])),
                ('multiplekernelridgecv',
                 MultipleKernelRidgeCV(kernels='precomputed', random_state=42,
                                       solver_params={'alphas': array([1.00000000e-05, 1.77827941e-05, 3.16227766e-05, 5.62341325e-05,
       1...
       1.00000000e-01, 1.77827941e-01, 3.16227766e-01, 5.62341325e-01,
       1.00000000e+00, 1.77827941e+00, 3.16227766e+00, 5.62341325e+00,
       1.00000000e+01, 1.77827941e+01, 3.16227766e+01, 5.62341325e+01,
       1.00000000e+02, 1.77827941e+02, 3.16227766e+02, 5.62341325e+02,
       1.00000000e+03, 1.77827941e+03, 3.16227766e+03, 5.62341325e+03,
       1.00000000e+04, 1.77827941e+04, 3.16227766e+04, 5.62341325e+04,
       1.00000000e+05]),
                                                      'progress_bar': False}))])
Please rerun this cell to show the HTML repr or trust the notebook.


Then, we manually perfom a hyperparameter grid search for the kernel weights.

# Make the score method use `split=True` by default.
model.score = partial(model.score, split=True)

# Define the hyperparameter grid search.
ratios = np.logspace(-4, 4, 41)
candidates = np.array([1 - ratios / (1 + ratios), ratios / (1 + ratios)]).T

# Loop over hyperparameter candidates
split_r2_scores = []
for candidate in bar(candidates, "Hyperparameter candidates"):
    # test one hyperparameter candidate at a time
    pipe[-1].solver_params["n_iter"] = candidate[None]
    pipe.fit(X_train, Y_train)

    # split the R2 score between both kernels
    scores = pipe.score(X_test, Y_test)
    split_r2_scores.append(backend.to_numpy(scores))

# average scores over targets for plotting
split_r2_scores_avg = np.array(split_r2_scores).mean(axis=2)

Out:

[                                        ] 0% | 0.00 sec | Hyperparameter candidates |
[                                        ] 2% | 0.19 sec | Hyperparameter candidates |
[.                                       ] 5% | 0.37 sec | Hyperparameter candidates |
[..                                      ] 7% | 0.55 sec | Hyperparameter candidates |
[...                                     ] 10% | 0.74 sec | Hyperparameter candidates |
[....                                    ] 12% | 0.92 sec | Hyperparameter candidates |
[.....                                   ] 15% | 1.11 sec | Hyperparameter candidates |
[......                                  ] 17% | 1.29 sec | Hyperparameter candidates |
[.......                                 ] 20% | 1.48 sec | Hyperparameter candidates |
[........                                ] 22% | 1.66 sec | Hyperparameter candidates |
[.........                               ] 24% | 1.85 sec | Hyperparameter candidates |
[..........                              ] 27% | 2.03 sec | Hyperparameter candidates |
[...........                             ] 29% | 2.22 sec | Hyperparameter candidates |
[............                            ] 32% | 2.40 sec | Hyperparameter candidates |
[.............                           ] 34% | 2.58 sec | Hyperparameter candidates |
[..............                          ] 37% | 2.77 sec | Hyperparameter candidates |
[...............                         ] 39% | 2.95 sec | Hyperparameter candidates |
[................                        ] 41% | 3.14 sec | Hyperparameter candidates |
[.................                       ] 44% | 3.32 sec | Hyperparameter candidates |
[..................                      ] 46% | 3.51 sec | Hyperparameter candidates |
[...................                     ] 49% | 3.69 sec | Hyperparameter candidates |
[....................                    ] 51% | 3.88 sec | Hyperparameter candidates |
[.....................                   ] 54% | 4.06 sec | Hyperparameter candidates |
[......................                  ] 56% | 4.25 sec | Hyperparameter candidates |
[.......................                 ] 59% | 4.43 sec | Hyperparameter candidates |
[........................                ] 61% | 4.61 sec | Hyperparameter candidates |
[.........................               ] 63% | 4.80 sec | Hyperparameter candidates |
[..........................              ] 66% | 4.98 sec | Hyperparameter candidates |
[...........................             ] 68% | 5.17 sec | Hyperparameter candidates |
[............................            ] 71% | 5.35 sec | Hyperparameter candidates |
[.............................           ] 73% | 5.54 sec | Hyperparameter candidates |
[..............................          ] 76% | 5.72 sec | Hyperparameter candidates |
[...............................         ] 78% | 5.91 sec | Hyperparameter candidates |
[................................        ] 80% | 6.09 sec | Hyperparameter candidates |
[.................................       ] 83% | 6.28 sec | Hyperparameter candidates |
[..................................      ] 85% | 6.46 sec | Hyperparameter candidates |
[...................................     ] 88% | 6.65 sec | Hyperparameter candidates |
[....................................    ] 90% | 6.83 sec | Hyperparameter candidates |
[.....................................   ] 93% | 7.01 sec | Hyperparameter candidates |
[......................................  ] 95% | 7.20 sec | Hyperparameter candidates |
[....................................... ] 98% | 7.38 sec | Hyperparameter candidates |
[........................................] 100% | 7.57 sec | Hyperparameter candidates |

Plot the variance decomposition for all the hyperparameter ratios.

For a ratio of 1e-3, feature space 0 is almost not used. For a ratio of 1e3, feature space 1 is almost not used. The best ratio is here around 1, because the feature spaces are used with similar scales in the simulated dataset.

fig, ax = plt.subplots(figsize=(5, 4))
accumulator = np.zeros_like(ratios)
for split in split_r2_scores_avg.T:
    ax.fill_between(ratios, accumulator, accumulator + split, alpha=0.7)
    accumulator += split

ax.set(xscale='log')
ax.set(xlabel=r"Ratio of kernel weight ($\gamma_A / \gamma_B$)")
ax.set(ylabel=r"$R^2$ score (test set)")
ax.set(title=r"$R^2$ score decomposition")
ax.legend(feature_names, loc="upper left")
ax.grid()
fig.tight_layout()
plt.show()
$R^2$ score decomposition

Total running time of the script: ( 0 minutes 7.662 seconds)

Gallery generated by Sphinx-Gallery