himalaya.ridge.GroupRidgeCV¶
- class himalaya.ridge.GroupRidgeCV(groups=None, solver='random_search', solver_params=None, fit_intercept=False, cv=5, random_state=None, Y_in_cpu=False, force_cpu=False)[source]¶
Group ridge regression with cross-validation.
Solve the group-regularized ridge regression:
b* = argmin_b ||Z @ b - Y||^2 + ||b||^2
where the feature space X_i is scaled by a group scaling
Z_i = exp(deltas[i] / 2) X_i
The solver optimizes the log group scalings
deltas
over cross-validation, using random search (solver="random_search"
).- Parameters
- groupsarray of shape (n_features, ), “input”, or None
Encoding of the group of each feature. If None, all features are gathered in one group, and the problem is equivalent to RidgeCV. If “input”, the input features
X
should be a list of 2D arrays, corresponding to each group.- solverstr
Algorithm used during the fit, only “random_search” for now.
- solver_paramsdict or None
Additional parameters for the solver. See more details in the docstring of the function:
GroupRidgeCV.ALL_SOLVERS[solver]
- fit_interceptboolean
Whether to fit an intercept. If False, X and Y must be zero-mean over samples.
- cvint or scikit-learn splitter
Cross-validation splitter. If an int, KFold is used.
- random_stateint, or None
Random generator seed. Use an int for deterministic search.
- Y_in_cpubool
If True, keep the target values
y
in CPU memory (slower).- force_cpubool
If True, computations will be performed on CPU, ignoring the current backend. If False, use the current backend.
Examples
>>> from himalaya.ridge import GroupRidgeCV >>> from himalaya.ridge import ColumnTransformerNoStack >>> from sklearn.pipeline import make_pipeline
>>> # create a dataset >>> import numpy as np >>> n_samples, n_features, n_targets = 10, 5, 3 >>> X = np.random.randn(n_samples, n_features) >>> Y = np.random.randn(n_samples, n_targets)
>>> # Separate the first three columns and the last two >>> # columns, creating two groups of shape (n_samples, n_feature_i). >>> from sklearn.preprocessing import StandardScaler >>> ct = ColumnTransformerNoStack( ... [("group_1", StandardScaler(), [0, 1, 2]), ... ("group_2", StandardScaler(), slice(3, 5))])
>>> # A model with automatic groups, as output by ColumnTransformerNoStack >>> model = GroupRidgeCV(groups="input") >>> pipe = make_pipeline(ct, model) >>> _ = pipe.fit(X, Y)
- Attributes
- coef_array of shape (n_features) or (n_features, n_targets)
Ridge coefficients.
- intercept_float or array of shape (n_targets, )
Intercept. Only returned when fit_intercept is True.
- deltas_array of shape (n_groups, n_targets)
Log of the group scalings.
- cv_scores_array of shape (n_iter, n_targets)
Cross-validation scores, averaged over splits. By default, the scores are computed with l2_neg_loss (in ]-inf, 0]). The scoring function can be changed with solver_params[“score_func”].
- n_features_in_int
Number of features used during the fit.
- dtype_str
Dtype of input data.
- best_alphas_array of shape (n_targets, )
Equal to
1. / exp(self.deltas_).sum(0)
. For the “random_search” solver, it corresponds to the best hyperparameter alphas, assuming that each squared group scaling vector sums to one (in particular, it is the case whensolver_params['n_iter']
is an integer).
Methods
fit
(X[, y])Fit the model.
get_params
([deep])Get parameters for this estimator.
predict
(X[, split])Predict using the model.
score
(X, y[, split])Return the coefficient of determination R^2 of the prediction.
set_params
(**params)Set the parameters of this estimator.