Note
Click here to download the full example code
Multiple-kernel ridge path between two kernels¶
This example demonstrates the path of all possible ratios of kernel weights between two kernels, in a multiple kernel ridge regression model. Over the path of ratios, the kernels are weighted by the kernel weights, then summed, and a joint model is fit on the obtained kernel. The explained variance on a test set is then computed, and decomposed over both kernels.
from functools import partial
import numpy as np
import matplotlib.pyplot as plt
from himalaya.backend import set_backend
from himalaya.kernel_ridge import MultipleKernelRidgeCV
from himalaya.kernel_ridge import Kernelizer
from himalaya.kernel_ridge import ColumnKernelizer
from himalaya.progress_bar import bar
from himalaya.utils import generate_multikernel_dataset
from sklearn.pipeline import make_pipeline
from sklearn import set_config
set_config(display='diagram')
In this example, we use the cupy
backend.
backend = set_backend("cupy", on_error="warn")
Generate a random dataset¶
X_train : array of shape (n_samples_train, n_features)
X_test : array of shape (n_samples_test, n_features)
Y_train : array of shape (n_samples_train, n_targets)
Y_test : array of shape (n_samples_test, n_targets)
n_targets = 50
kernel_weights = np.tile(np.array([0.6, 0.4])[None], (n_targets, 1))
(X_train, X_test, Y_train, Y_test,
kernel_weights, n_features_list) = generate_multikernel_dataset(
n_kernels=2, n_targets=n_targets, n_samples_train=600,
n_samples_test=300, random_state=42, noise=0.31,
kernel_weights=kernel_weights)
feature_names = [f"Feature space {ii}" for ii in range(len(n_features_list))]
Create a MultipleKernelRidgeCV model, see plot_mkr_sklearn_api.py for more details.
# Find the start and end of each feature space X in Xs.
start_and_end = np.concatenate([[0], np.cumsum(n_features_list)])
slices = [
slice(start, end)
for start, end in zip(start_and_end[:-1], start_and_end[1:])
]
# Create a different ``Kernelizer`` for each feature space.
kernelizers = [(name, Kernelizer(), slice_)
for name, slice_ in zip(feature_names, slices)]
column_kernelizer = ColumnKernelizer(kernelizers)
# Create a MultipleKernelRidgeCV model.
solver_params = dict(alphas=np.logspace(-5, 5, 41), progress_bar=False)
model = MultipleKernelRidgeCV(kernels="precomputed", solver="random_search",
solver_params=solver_params,
random_state=42)
pipe = make_pipeline(column_kernelizer, model)
pipe
Then, we manually perfom a hyperparameter grid search for the kernel weights.
# Make the score method use `split=True` by default.
model.score = partial(model.score, split=True)
# Define the hyperparameter grid search.
ratios = np.logspace(-4, 4, 41)
candidates = np.array([1 - ratios / (1 + ratios), ratios / (1 + ratios)]).T
# Loop over hyperparameter candidates
split_r2_scores = []
for candidate in bar(candidates, "Hyperparameter candidates"):
# test one hyperparameter candidate at a time
pipe[-1].solver_params["n_iter"] = candidate[None]
pipe.fit(X_train, Y_train)
# split the R2 score between both kernels
scores = pipe.score(X_test, Y_test)
split_r2_scores.append(backend.to_numpy(scores))
# average scores over targets for plotting
split_r2_scores_avg = np.array(split_r2_scores).mean(axis=2)
Out:
[ ] 0% | 0.00 sec | Hyperparameter candidates |
[ ] 2% | 0.19 sec | Hyperparameter candidates |
[. ] 5% | 0.37 sec | Hyperparameter candidates |
[.. ] 7% | 0.55 sec | Hyperparameter candidates |
[... ] 10% | 0.74 sec | Hyperparameter candidates |
[.... ] 12% | 0.92 sec | Hyperparameter candidates |
[..... ] 15% | 1.11 sec | Hyperparameter candidates |
[...... ] 17% | 1.29 sec | Hyperparameter candidates |
[....... ] 20% | 1.48 sec | Hyperparameter candidates |
[........ ] 22% | 1.66 sec | Hyperparameter candidates |
[......... ] 24% | 1.85 sec | Hyperparameter candidates |
[.......... ] 27% | 2.03 sec | Hyperparameter candidates |
[........... ] 29% | 2.22 sec | Hyperparameter candidates |
[............ ] 32% | 2.40 sec | Hyperparameter candidates |
[............. ] 34% | 2.58 sec | Hyperparameter candidates |
[.............. ] 37% | 2.77 sec | Hyperparameter candidates |
[............... ] 39% | 2.95 sec | Hyperparameter candidates |
[................ ] 41% | 3.14 sec | Hyperparameter candidates |
[................. ] 44% | 3.32 sec | Hyperparameter candidates |
[.................. ] 46% | 3.51 sec | Hyperparameter candidates |
[................... ] 49% | 3.69 sec | Hyperparameter candidates |
[.................... ] 51% | 3.88 sec | Hyperparameter candidates |
[..................... ] 54% | 4.06 sec | Hyperparameter candidates |
[...................... ] 56% | 4.25 sec | Hyperparameter candidates |
[....................... ] 59% | 4.43 sec | Hyperparameter candidates |
[........................ ] 61% | 4.61 sec | Hyperparameter candidates |
[......................... ] 63% | 4.80 sec | Hyperparameter candidates |
[.......................... ] 66% | 4.98 sec | Hyperparameter candidates |
[........................... ] 68% | 5.17 sec | Hyperparameter candidates |
[............................ ] 71% | 5.35 sec | Hyperparameter candidates |
[............................. ] 73% | 5.54 sec | Hyperparameter candidates |
[.............................. ] 76% | 5.72 sec | Hyperparameter candidates |
[............................... ] 78% | 5.91 sec | Hyperparameter candidates |
[................................ ] 80% | 6.09 sec | Hyperparameter candidates |
[................................. ] 83% | 6.28 sec | Hyperparameter candidates |
[.................................. ] 85% | 6.46 sec | Hyperparameter candidates |
[................................... ] 88% | 6.65 sec | Hyperparameter candidates |
[.................................... ] 90% | 6.83 sec | Hyperparameter candidates |
[..................................... ] 93% | 7.01 sec | Hyperparameter candidates |
[...................................... ] 95% | 7.20 sec | Hyperparameter candidates |
[....................................... ] 98% | 7.38 sec | Hyperparameter candidates |
[........................................] 100% | 7.57 sec | Hyperparameter candidates |
Plot the variance decomposition for all the hyperparameter ratios.
For a ratio of 1e-3, feature space 0 is almost not used. For a ratio of 1e3, feature space 1 is almost not used. The best ratio is here around 1, because the feature spaces are used with similar scales in the simulated dataset.
fig, ax = plt.subplots(figsize=(5, 4))
accumulator = np.zeros_like(ratios)
for split in split_r2_scores_avg.T:
ax.fill_between(ratios, accumulator, accumulator + split, alpha=0.7)
accumulator += split
ax.set(xscale='log')
ax.set(xlabel=r"Ratio of kernel weight ($\gamma_A / \gamma_B$)")
ax.set(ylabel=r"$R^2$ score (test set)")
ax.set(title=r"$R^2$ score decomposition")
ax.legend(feature_names, loc="upper left")
ax.grid()
fig.tight_layout()
plt.show()
Total running time of the script: ( 0 minutes 7.662 seconds)