himalaya.ridge.ColumnTransformerNoStack¶
- class himalaya.ridge.ColumnTransformerNoStack(transformers, *, remainder='drop', sparse_threshold=0.3, n_jobs=None, transformer_weights=None, verbose=False, verbose_feature_names_out=True)[source]¶
Applies transformers to columns of an array, and does not stack them.
This estimator allows different columns or column subsets of the input to be transformed separately. The different groups of features generated are not stacked together, to be used e.g. in a GroupRidgeCV(groups=”auto”). This is useful to perform separate transformations on different feature spaces.
Warning : This class does not perfectly abide by scikit-learn’s API. Indeed, it returns a list of
n_groups
matrices of shape (n_samples, n_features_i), while scikit-learn’s API only allows arrays of shape (n_samples, n_features). This class is intended to be used in a scikit-learn pipeline just before a GroupRidgeCV(groups=”auto”).- Parameters
- transformerslist of tuples
List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data.
- namestr
Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using
set_params
and searched in grid search.- transformer{‘drop’, ‘passthrough’} or estimator
Estimator must support
fit
andtransform
. Special-cased strings ‘drop’ and ‘passthrough’ are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively.- columnsstr, array-like of str, int, array-like of int, array-like of bool, slice or callable
Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where
transformer
expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data X and can return any of the above. To select multiple columns by name or dtype, you can usemake_column_selector
.
- remainder{‘drop’, ‘passthrough’} or estimator, default=’drop’
By default, only the specified columns in transformers are transformed and combined in the output, and the non-specified columns are dropped. (default of
'drop'
). By specifyingremainder='passthrough'
, all remaining columns that were not specified in transformers will be automatically passed through. This subset of columns is concatenated with the output of the transformers. By settingremainder
to be an estimator, the remaining non-specified columns will use theremainder
estimator. The estimator must supportfit
andtransform
. Note that using this feature requires that the DataFrame columns input atfit
andtransform
have identical order.- n_jobsint, default=None
Number of jobs to run in parallel.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. n_jobs does not work with GPU backends.- transformer_weightsdict, default=None
Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.
- verbosebool, default=False
If True, the time elapsed while fitting each transformer will be printed as it is completed.
See also
himalaya.ridge.make_column_transform_no_stack
convenience function for combining the outputs of multiple pipelines applied to column subsets of the original feature space.
sklearn.compose.make_column_selector
convenience function for selecting columns based on datatype or the columns name with a regex pattern.
Notes
The order of the columns in the transformed feature matrix follows the order of how the columns are specified in the transformers list. Columns of the original feature matrix that are not specified are dropped from the resulting transformed feature matrix, unless specified in the passthrough keyword. Those columns specified with passthrough are added at the right to the output of the transformers.
Examples
>>> import numpy as np >>> from himalaya.ridge import ColumnTransformerNoStack >>> from sklearn.preprocessing import StandardScaler >>> ct = ColumnTransformerNoStack( ... [("group_1", StandardScaler(), [0, 1, 2]), ... ("group_2", StandardScaler(), slice(3, 5))]) >>> X = np.random.randn(10, 5) >>> # Group separately the first three columns and the last two >>> # columns, creating two feature spaces. >>> Xs = ct.fit_transform(X) >>> print(Xi.shape for Xi in Xs) (2, 4, 4)
- Attributes
- transformers_list
The collection of fitted transformers as tuples of (name, fitted_transformer, column). fitted_transformer can be an estimator, ‘drop’, or ‘passthrough’. In case there were no columns selected, this will be the unfitted transformer. If there are remaining columns, the final element is a tuple of the form: (‘remainder’, transformer, remaining_columns) corresponding to the
remainder
parameter. If there are remaining columns, thenlen(transformers_)==len(transformers)+1
, otherwiselen(transformers_)==len(transformers)
.named_transformers_
Bunch
Access the fitted transformer by name.
- n_features_in_int
Number of features used during the fit.
- sparse_output_False
Methods
fit
(X[, y])Fit all transformers using X.
fit_transform
(X[, y])Fit all transformers, transform the data and concatenate results.
get_feature_names
()DEPRECATED: get_feature_names is deprecated in 1.0 and will be removed in 1.2.
get_feature_names_out
([input_features])Get output feature names for transformation.
get_params
([deep])Get parameters for this estimator.
set_params
(**kwargs)Set the parameters of this estimator.
transform
(X)Transform X separately by each transformer, concatenate results.