yikit.feature_selection package#
Feature selection methods for machine learning.
This module provides filter methods and wrapper methods for feature selection, including correlation-based filtering and Boruta algorithm.
- class yikit.feature_selection.BorutaPy(estimator, n_estimators: int | Literal['auto'] = 'auto', perc: float | Literal['auto'] = 'auto', alpha: float = 0.05, two_step: bool = True, max_iter: int = 100, random_state: RandomState | int | None = None, verbose: int = 1, max_shuf: int = 10000, n_jobs: int | None = None)#
Bases:
BorutaPy- fit(X, y)#
This docstring uses the same one as scikit-learn-contrib/boruta_py, which is a class inheritor under the BSD 3 clause license. scikit-learn-contrib/boruta_py
Fits the Boruta feature selection with the provided estimator. :param X: The training input samples. :type X: array-like, shape = [n_samples, n_features] :param y: The target values. :type y: array-like, shape = [n_samples]
- get_support(weak: bool = False) ndarray#
Get a mask, or integer index, of the features selected.
- Parameters:
weak (bool, optional) – If set to True, the tentative features are also included in the support mask. If False, only confirmed features are included. By default False.
- Returns:
support – A boolean mask of the features selected. If weak=True, includes both confirmed and tentative features. If weak=False, includes only confirmed features.
- Return type:
array of shape [n_features]
- set_transform_request(*, return_df: bool | None | str = '$UNCHANGED$', weak: bool | None | str = '$UNCHANGED$') BorutaPy#
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
return_df (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
return_dfparameter intransform.weak (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
weakparameter intransform.
- Returns:
self – The updated object.
- Return type:
object
- class yikit.feature_selection.FilterSelector(r: float = 0.9, alpha: float = 0.05, verbose: int | bool = True, n_jobs: int | None = None)#
Bases:
SelectorMixin,BaseEstimator- fit(X, y=None)#
Fit the FilterSelector.
This method identifies features in X that are highly correlated (above correlation threshold r and with significant p-value below alpha). Features that are highly correlated with others will be recursively removed, keeping only minimally redundant features.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix. Can be a pandas DataFrame, 2D numpy array, or 2D list.
y (Ignored) – Not used, present for API consistency.
- Returns:
self – Returns the instance itself.
- Return type:
object