himalaya.ridge.GroupRidgeCV¶
- class himalaya.ridge.GroupRidgeCV(groups=None, solver='random_search', solver_params=None, fit_intercept=False, cv=5, random_state=None, Y_in_cpu=False, force_cpu=False)[source]¶
Group ridge regression with cross-validation.
Solve the group-regularized ridge regression:
b* = argmin_b ||Z @ b - Y||^2 + ||b||^2
where the feature space X_i is scaled by a group scaling
Z_i = exp(deltas[i] / 2) X_i
The solver optimizes the log group scalings
deltasover cross-validation, using random search (solver="random_search").- Parameters
- groupsarray of shape (n_features, ), “input”, or None
Encoding of the group of each feature. If None, all features are gathered in one group, and the problem is equivalent to RidgeCV. If “input”, the input features
Xshould be a list of 2D arrays, corresponding to each group.- solverstr
Algorithm used during the fit, only “random_search” for now.
- solver_paramsdict or None
Additional parameters for the solver. See more details in the docstring of the function:
GroupRidgeCV.ALL_SOLVERS[solver]- fit_interceptboolean
Whether to fit an intercept. If False, X and Y must be zero-mean over samples.
- cvint or scikit-learn splitter
Cross-validation splitter. If an int, KFold is used.
- random_stateint, or None
Random generator seed. Use an int for deterministic search.
- Y_in_cpubool
If True, keep the target values
yin CPU memory (slower).- force_cpubool
If True, computations will be performed on CPU, ignoring the current backend. If False, use the current backend.
Examples
>>> from himalaya.ridge import GroupRidgeCV >>> from himalaya.ridge import ColumnTransformerNoStack >>> from sklearn.pipeline import make_pipeline
>>> # create a dataset >>> import numpy as np >>> n_samples, n_features, n_targets = 10, 5, 3 >>> X = np.random.randn(n_samples, n_features) >>> Y = np.random.randn(n_samples, n_targets)
>>> # Separate the first three columns and the last two >>> # columns, creating two groups of shape (n_samples, n_feature_i). >>> from sklearn.preprocessing import StandardScaler >>> ct = ColumnTransformerNoStack( ... [("group_1", StandardScaler(), [0, 1, 2]), ... ("group_2", StandardScaler(), slice(3, 5))])
>>> # A model with automatic groups, as output by ColumnTransformerNoStack >>> model = GroupRidgeCV(groups="input") >>> pipe = make_pipeline(ct, model) >>> _ = pipe.fit(X, Y)
- Attributes
- coef_array of shape (n_features) or (n_features, n_targets)
Ridge coefficients.
- intercept_float or array of shape (n_targets, )
Intercept. Only returned when fit_intercept is True.
- deltas_array of shape (n_groups, n_targets)
Log of the group scalings.
- cv_scores_array of shape (n_iter, n_targets)
Cross-validation scores, averaged over splits. By default, the scores are computed with l2_neg_loss (in ]-inf, 0]). The scoring function can be changed with solver_params[“score_func”].
- n_features_in_int
Number of features used during the fit.
- dtype_str
Dtype of input data.
- best_alphas_array of shape (n_targets, )
Equal to
1. / exp(self.deltas_).sum(0). For the “random_search” solver, it corresponds to the best hyperparameter alphas, assuming that each squared group scaling vector sums to one (in particular, it is the case whensolver_params['n_iter']is an integer).
Methods
fit(X[, y])Fit the model.
get_params([deep])Get parameters for this estimator.
predict(X[, split])Predict using the model.
score(X, y[, split])Return the coefficient of determination R^2 of the prediction.
set_params(**params)Set the parameters of this estimator.