himalaya.kernel_ridge.solve_multiple_kernel_ridge_hyper_gradient

himalaya.kernel_ridge.solve_multiple_kernel_ridge_hyper_gradient(Ks, Y, score_func=<function l2_neg_loss>, cv=5, fit_intercept=False, return_weights=None, Xs=None, initial_deltas=0, max_iter=10, tol=0.01, max_iter_inner_dual=1, max_iter_inner_hyper=1, cg_tol=0.001, n_targets_batch=None, hyper_gradient_method='conjugate_gradient', kernel_ridge_method='gradient_descent', random_state=None, progress_bar=True, Y_in_cpu=False)[source]

Solve bilinear kernel ridge regression with cross-validation.

The hyper-parameters deltas correspond to:

log(kernel_weights / ridge_regularization)
Parameters
Ksarray of shape (n_kernels, n_samples, n_samples)

Training kernel for each feature space.

Yarray of shape (n_samples, n_targets)

Training target data.

score_funccallable

Function used to compute the score of predictions.

cvint or scikit-learn splitter

Cross-validation splitter. If an int, KFold is used.

fit_interceptboolean

Whether to fit an intercept. If False, Ks should be centered (see KernelCenterer), and Y must be zero-mean over samples. Only available if return_weights == ‘dual’.

return_weightsNone, ‘primal’, or ‘dual’

Whether to refit on the entire dataset and return the weights.

Xsarray of shape (n_kernels, n_samples, n_features) or None

Necessary if return_weights == ‘primal’.

initial_deltasstr, float, array of shape (n_kernels, n_targets)

Initial log kernel weights for each target. If a float, initialize the deltas with this value. If a str, initialize the deltas with different strategies: - ‘ridgecv’ : fit a RidgeCV model over the average kernel.

max_iterint

Maximum number of iteration for the outer loop.

tolfloat > 0, or None

Tolerance for the stopping criterion.

max_iter_inner_dualint

Maximum number of iterations for the dual weights conjugate gradient.

max_iter_inner_hyper :

Maximum number of iterations for the deltas gradient descent.

cg_tolfloat, or array of shape (max_iter)

Tolerance for the conjugate gradients.

n_targets_batchint or None

Size of the batch for computing predictions. Used for memory reasons. If None, uses all n_targets at once.

hyper_gradient_methodstr, “conjugate_gradient”, “neumann”, “direct”

Method to compute the hypergradient.

kernel_ridge_methodstr, “conjugate_gradient” or “gradient_descent”

Algorithm used for the inner step.

random_stateint, or None

Random generator seed. Use an int for deterministic search.

progress_barbool

If True, display a progress bar over batches and iterations.

Y_in_cpubool

If True, keep the target values Y in CPU memory (slower).

Returns
deltasarray of shape (n_kernels, n_targets)

Best log kernel weights for each target.

refit_weightsarray or None

Refit regression weights on the entire dataset, using selected best hyperparameters. Refit weights will always be on CPU memory. If return_weights == ‘primal’, shape is (n_features, n_targets), if return_weights == ‘dual’, shape is (n_samples, n_targets), else, None.

cv_scoresarray of shape (max_iter * max_iter_inner_hyper, n_targets)

Cross-validation scores per iteration, averaged over splits. Cross-validation scores will always be on CPU memory.

interceptarray of shape (n_targets,)

Intercept. Only returned when fit_intercept is True.