yikit.feature_selection package#

Feature selection methods for machine learning.

This module provides filter methods and wrapper methods for feature selection, including correlation-based filtering and Boruta algorithm.

class yikit.feature_selection.BorutaPy(estimator, n_estimators: int | Literal['auto'] = 'auto', perc: float | Literal['auto'] = 'auto', alpha: float = 0.05, two_step: bool = True, max_iter: int = 100, random_state: RandomState | int | None = None, verbose: int = 1, max_shuf: int = 10000, n_jobs: int | None = None)#

Bases: BorutaPy

fit(X, y)#

This docstring uses the same one as scikit-learn-contrib/boruta_py, which is a class inheritor under the BSD 3 clause license. scikit-learn-contrib/boruta_py

Fits the Boruta feature selection with the provided estimator. :param X: The training input samples. :type X: array-like, shape = [n_samples, n_features] :param y: The target values. :type y: array-like, shape = [n_samples]

get_support(weak: bool = False) ndarray#

Get a mask, or integer index, of the features selected.

Parameters:

weak (bool, optional) – If set to True, the tentative features are also included in the support mask. If False, only confirmed features are included. By default False.

Returns:

support – A boolean mask of the features selected. If weak=True, includes both confirmed and tentative features. If weak=False, includes only confirmed features.

Return type:

array of shape [n_features]

set_transform_request(*, return_df: bool | None | str = '$UNCHANGED$', weak: bool | None | str = '$UNCHANGED$') BorutaPy#

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • return_df (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_df parameter in transform.

  • weak (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for weak parameter in transform.

Returns:

self – The updated object.

Return type:

object

class yikit.feature_selection.FilterSelector(r: float = 0.9, alpha: float = 0.05, verbose: int | bool = True, n_jobs: int | None = None)#

Bases: SelectorMixin, BaseEstimator

fit(X, y=None)#

Fit the FilterSelector.

This method identifies features in X that are highly correlated (above correlation threshold r and with significant p-value below alpha). Features that are highly correlated with others will be recursively removed, keeping only minimally redundant features.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix. Can be a pandas DataFrame, 2D numpy array, or 2D list.

  • y (Ignored) – Not used, present for API consistency.

Returns:

self – Returns the instance itself.

Return type:

object