Model descriptions ================== This package implements a number of models. Ridge ----- Let :math:`X\in \mathbb{R}^{n\times p}` be a feature matrix with :math:`n` samples and :math:`p` features, :math:`y\in \mathbb{R}^n` a target vector, and :math:`\alpha > 0` a fixed regularization hyperparameter. Ridge regression [1]_ defines the weight vector :math:`b^*\in \mathbb{R}^p` as: .. math:: b^* = \arg\min_b \|Xb - y\|_2^2 + \alpha \|b\|_2^2. The equation has a closed-form solution :math:`b^* = M y`, where :math:`M = (X^\top X + \alpha I_p)^{-1}X^\top \in \mathbb{R}^{p \times n}`. .. admonition:: This model is implemented in - :class:`~himalaya.ridge.Ridge` (scikit-learn-compatible estimator) - :func:`~himalaya.ridge.solve_ridge_svd` (function) KernelRidge ----------- By the Woodbury matrix identity, :math:`b^*` can be written as :math:`b^* = X^\top(XX^\top + \alpha I_n)^{-1}y`, or :math:`b^* = X^\top w^*` for some :math:`w^*\in \mathbb{R}^n`. Noting the linear kernel :math:`K = X X^\top \in \mathbb{R}^{n\times n}`, this leads to the *equivalent* formulation: .. math:: w^* = \arg\min_w \|Kw - y\|_2^2 + \alpha w^\top Kw. This model can be extended to arbitrary positive semidefinite kernels :math:`K`, leading to the more general kernel ridge regression [2]_. .. admonition:: This model is implemented in - :class:`~himalaya.kernel_ridge.KernelRidge` (scikit-learn-compatible estimator) - :func:`~himalaya.kernel_ridge.solve_kernel_ridge_eigenvalues` (function) - :func:`~himalaya.kernel_ridge.solve_kernel_ridge_gradient_descent` (function) - :func:`~himalaya.kernel_ridge.solve_kernel_ridge_conjugate_gradient` (function) RidgeCV and KernelRidgeCV ------------------------- In practice, because the ridge regression and kernel ridge regression hyperparameter :math:`\alpha` is unknown, it is typically selected through a grid-search with cross-validation. In cross-validation, we split the data set into a training set :math:`(X_{train}, y_{train})` and a validation set :math:`(X_{val}, y_{val})`. Then, we train the model on the training set, and evaluate the generalization performance on the validation set. We perform this process for multiple hyperparameter candidates :math:`\alpha`, typically defined over a grid of log-spaced values. Finally, we keep the candidate leading to the best generalization performance, as measured by the validation loss, averaged over all cross-validation splits. .. admonition:: These models are implemented in - :class:`~himalaya.ridge.RidgeCV` (scikit-learn-compatible estimator) - :func:`~himalaya.ridge.solve_ridge_cv_svd` (function) - :class:`~himalaya.kernel_ridge.KernelRidgeCV` (scikit-learn-compatible estimator) - :func:`~himalaya.kernel_ridge.solve_kernel_ridge_cv_eigenvalues` (function) GroupRidgeCV / BandedRidgeCV ---------------------------- In some applications, features are naturally grouped into groups (or feature spaces). To adapt the regularization level to each feature space, ridge regression can be extended to group-regularized ridge regression (also known as banded ridge regression [3]_). In this model, a separate hyperparameter is optimized for each feature space: .. math:: b^* = \arg\min_b \|\sum_{i=1}^m X_i b_i - y\|_2^2 + \sum_{i=1}^m \alpha_i \|b_i\|_2^2. This is equivalent to solving a ridge regression: .. math:: b^* = \arg\min_b \|Z b - Y\|_2^2 + \|b\|_2^2 where the feature space :math:`X_i` is scaled by a group scaling :math:`Z_i = e^{\delta_i} X_i`. The hyperparameters :math:`\delta_i = - \log(\alpha_i)` are then learned over cross-validation. .. admonition:: This model is implemented in - :class:`~himalaya.ridge.GroupRidgeCV` (scikit-learn-compatible estimator) - :func:`~himalaya.ridge.solve_group_ridge_random_search` (function) See also multiple-kernel ridge regression, which is equivalent to group-regularization ridge regression when using one linear kernel per group of features: - :class:`~himalaya.kernel_ridge.MultipleKernelRidgeCV` (scikit-learn-compatible estimator) - :func:`~himalaya.kernel_ridge.solve_multiple_kernel_ridge_random_search` (function) - :func:`~himalaya.kernel_ridge.solve_multiple_kernel_ridge_hyper_gradient` (function) .. note:: "Group ridge regression" is also sometimes called "Banded ridge regression". WeightedKernelRidge ------------------- To extend kernel ridge to group-regularization, we can compute the kernel as a weighted sum of multiple kernels, :math:`K = \sum_{i=1}^m e^{\delta_i} K_i`. Then, we can use :math:`K_i = X_i X_i^\top` for different groups of features :math:`X_i`. The model becomes: .. math:: w^* = \arg\min_w \left\|\sum_{i=1}^m e^{\delta_i} K_{i} w - y\right\|_2^2 + \alpha \sum_{i=1}^m e^{\delta_i} w^\top K_{i} w. This model is called weighted kernel ridge regresion. The log-kernel-weights :math:`\delta_i` are here fixed. When all the targets use the same log-kernel-weights, a single weighted kernel can be precomputed and used in a kernel ridge regression. However, when the log-kernel-weights are different for each target, the kernel sum cannot be precomputed, and the model requires some specific algorithms to be fit. .. admonition:: This model is implemented in - :class:`~himalaya.kernel_ridge.WeightedKernelRidge` (scikit-learn-compatible estimator) - :func:`~himalaya.kernel_ridge.solve_weighted_kernel_ridge_gradient_descent` (function) - :func:`~himalaya.kernel_ridge.solve_weighted_kernel_ridge_conjugate_gradient` (function) - :func:`~himalaya.kernel_ridge.solve_weighted_kernel_ridge_neumann_series` (function) MultipleKernelRidgeCV --------------------- In weighted kernel ridge regression, when the log-kernel-weights :math:`\delta_i` are unknown, we can learn them over cross-validation. This model is called multiple-kernel ridge regression. When the kernels are defined by :math:`K_i = X_i X_i^\top` for different groups of features :math:`X_i`, multiple-kernel ridge regression is equivalent with group-ridge regression (aka banded ridge regression). .. admonition:: This model is implemented in - :class:`~himalaya.kernel_ridge.MultipleKernelRidgeCV` (scikit-learn-compatible estimator) - :func:`~himalaya.kernel_ridge.solve_multiple_kernel_ridge_hyper_gradient` (function) - :func:`~himalaya.kernel_ridge.solve_multiple_kernel_ridge_random_search` (function) .. include:: flowchart.rst References ~~~~~~~~~~ .. [1] Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67. .. [2] Saunders, C., Gammerman, A., & Vovk, V. (1998). Ridge regression learning algorithm in dual variables. .. [3] Nunez-Elizalde, A. O., Huth, A. G., & Gallant, J. L. (2019). Voxelwise encoding models with non-spherical multivariate normal priors. Neuroimage, 197, 482-492.