himalaya.ridge.GroupRidgeCV¶

class himalaya.ridge.GroupRidgeCV(groups=None, solver='random_search', solver_params=None, fit_intercept=False, cv=5, random_state=None, Y_in_cpu=False, force_cpu=False)[source]¶

Group ridge regression with cross-validation.

Solve the group-regularized ridge regression:

b* = argmin_b ||Z @ b - Y||^2 + ||b||^2

where the feature space X_i is scaled by a group scaling

Z_i = exp(deltas[i] / 2) X_i

The solver optimizes the log group scalings deltas over cross-validation, using random search (solver="random_search").

Parameters

groupsarray of shape (n_features, ), “input”, or None: Encoding of the group of each feature. If None, all features are gathered in one group, and the problem is equivalent to RidgeCV. If “input”, the input features X should be a list of 2D arrays, corresponding to each group.
solverstr: Algorithm used during the fit, only “random_search” for now.
solver_paramsdict or None: Additional parameters for the solver. See more details in the docstring of the function: GroupRidgeCV.ALL_SOLVERS[solver]
fit_interceptboolean: Whether to fit an intercept. If False, X and Y must be zero-mean over samples.
cvint or scikit-learn splitter: Cross-validation splitter. If an int, KFold is used.
random_stateint, or None: Random generator seed. Use an int for deterministic search.
Y_in_cpubool: If True, keep the target values y in CPU memory (slower).
force_cpubool: If True, computations will be performed on CPU, ignoring the current backend. If False, use the current backend.

Examples

>>> from himalaya.ridge import GroupRidgeCV
>>> from himalaya.ridge import ColumnTransformerNoStack
>>> from sklearn.pipeline import make_pipeline

>>> # create a dataset
>>> import numpy as np
>>> n_samples, n_features, n_targets = 10, 5, 3
>>> X = np.random.randn(n_samples, n_features)
>>> Y = np.random.randn(n_samples, n_targets)

>>> # Separate the first three columns and the last two
>>> # columns, creating two groups of shape (n_samples, n_feature_i).
>>> from sklearn.preprocessing import StandardScaler
>>> ct = ColumnTransformerNoStack(
...     [("group_1", StandardScaler(), [0, 1, 2]),
...      ("group_2", StandardScaler(), slice(3, 5))])

>>> # A model with automatic groups, as output by ColumnTransformerNoStack
>>> model = GroupRidgeCV(groups="input")
>>> pipe = make_pipeline(ct, model)
>>> _ = pipe.fit(X, Y)

Attributes

coef_array of shape (n_features) or (n_features, n_targets): Ridge coefficients.
intercept_float or array of shape (n_targets, ): Intercept. Only returned when fit_intercept is True.
deltas_array of shape (n_groups, n_targets): Log of the group scalings.
cv_scores_array of shape (n_iter, n_targets): Cross-validation scores, averaged over splits. By default, the scores are computed with l2_neg_loss (in ]-inf, 0]). The scoring function can be changed with solver_params[“score_func”].
n_features_in_int: Number of features used during the fit.
dtype_str: Dtype of input data.
best_alphas_array of shape (n_targets, ): Equal to 1. / exp(self.deltas_).sum(0). For the “random_search” solver, it corresponds to the best hyperparameter alphas, assuming that each squared group scaling vector sums to one (in particular, it is the case when solver_params['n_iter'] is an integer).

Methods

`fit`(X[, y])	Fit the model.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X[, split])	Predict using the model.
`score`(X, y[, split])	Return the coefficient of determination R^2 of the prediction.
`set_params`(**params)	Set the parameters of this estimator.

Related Topics

Navigation

himalaya.ridge.GroupRidgeCV¶