yikit.models package#

Machine learning model wrappers and ensemble methods.

This module provides scikit-learn compatible regressors including ensemble methods, linear models, support vector machines, gradient boosting, neural networks, and hyperparameter optimization utilities.

class yikit.models.EnsembleRegressor(estimators=(RandomForestRegressor(),), method='blending', cv=5, n_jobs=-1, random_state=None, scoring='neg_mean_squared_error', verbose=0, boruta=True, opt=True)#

Bases: BaseEstimator, RegressorMixin

Ensemble regressor combining multiple base estimators.

This class provides ensemble methods for regression including blending, averaging, and stacking. It supports optional feature selection using Boruta and hyperparameter optimization using Optuna.

Parameters:
  • estimators (list of estimator objects, default=(RandomForestRegressor(),)) – List of base estimators to ensemble. Each estimator must be a scikit-learn compatible regressor.

  • method ({'blending', 'average', 'stacking'}, default='blending') – Ensemble method to use: - ‘blending’: Train each estimator on a different fold and combine predictions - ‘average’: Simple average of predictions from all estimators - ‘stacking’: Use a meta-learner to combine predictions

  • cv (int, cross-validation generator or iterable, default=5) – Determines the cross-validation splitting strategy.

  • n_jobs (int, default=-1) – Number of jobs to run in parallel.

  • random_state (int, RandomState instance or None, default=None) – Random state for reproducibility.

  • scoring (str, callable or list, default='neg_mean_squared_error') – Scoring metric(s) to use. See scikit-learn documentation for options.

  • verbose (int, default=0) – Verbosity level.

  • boruta (bool, default=True) – Whether to perform Boruta feature selection before training.

  • opt (bool, default=True) – Whether to perform hyperparameter optimization using Optuna.

estimators_#

The fitted base estimators.

Type:

list of fitted estimators

meta_estimator_#

The meta-learner used in stacking (None for other methods).

Type:

estimator or None

n_features_in_#

Number of features seen during fit.

Type:

int

feature_names_in_#

Names of features seen during fit.

Type:

ndarray of shape (n_features_in_,)

Examples

>>> from yikit.models import EnsembleRegressor
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> X = np.random.randn(100, 10)
>>> y = np.random.randn(100)
>>> ensemble = EnsembleRegressor(
...     estimators=[RandomForestRegressor(), Ridge()],
...     method='blending',
...     cv=5
... )
>>> ensemble.fit(X, y)
>>> predictions = ensemble.predict(X)
fit(X, y)#
predict(X)#
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') EnsembleRegressor#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class yikit.models.GBDTRegressor(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, importance_type='split', **kwargs)#

Bases: RegressorMixin, BaseEstimator

Gradient Boosting Decision Tree regressor using LightGBM.

This class provides a scikit-learn compatible wrapper for LightGBM’s gradient boosting decision tree regressor. It includes automatic early stopping using a validation set and supports all LightGBM parameters.

Parameters:
  • boosting_type (str, default='gbdt') – Type of boosting algorithm to use.

  • num_leaves (int, default=31) – Maximum tree leaves for base learners.

  • max_depth (int, default=-1) – Maximum tree depth for base learners, <=0 means no limit.

  • learning_rate (float, default=0.1) – Boosting learning rate.

  • n_estimators (int, default=100) – Number of boosted trees to fit.

  • subsample_for_bin (int, default=200000) – Number of samples for constructing bins.

  • objective (str or None, default=None) – Specify the learning task and the corresponding learning objective.

  • class_weight (dict, 'balanced' or None, default=None) – Weights associated with classes.

  • min_split_gain (float, default=0.0) – Minimum loss reduction required to make a further partition.

  • min_child_weight (float, default=0.001) – Minimum sum of instance weight (hessian) needed in a child.

  • min_child_samples (int, default=20) – Minimum number of data needed in a child (leaf).

  • subsample (float, default=1.0) – Subsample ratio of the training instance.

  • subsample_freq (int, default=0) – Frequency of subsample, <=0 means no enable.

  • colsample_bytree (float, default=1.0) – Subsample ratio of columns when constructing each tree.

  • reg_alpha (float, default=0.0) – L1 regularization term on weights.

  • reg_lambda (float, default=0.0) – L2 regularization term on weights.

  • random_state (int, RandomState instance or None, default=None) – Random state for reproducibility.

  • n_jobs (int, default=-1) – Number of parallel threads.

  • silent (bool, default=True) – Whether to print messages while running boosting.

  • importance_type (str, default='split') – The type of feature importance to be filled in feature_importances_.

  • **kwargs – Additional keyword arguments passed to LGBMRegressor.

estimator_#

The fitted LightGBM regressor.

Type:

LGBMRegressor

feature_importances_#

The feature importances.

Type:

array-like of shape (n_features,)

n_features_in_#

Number of features seen during fit.

Type:

int

rng_#

Random state instance used for reproducibility.

Type:

RandomState

Examples

>>> from yikit.models import GBDTRegressor
>>> import numpy as np
>>> X = np.random.randn(100, 10)
>>> y = np.random.randn(100)
>>> model = GBDTRegressor(n_estimators=100, learning_rate=0.1)
>>> model.fit(X, y)
>>> predictions = model.predict(X)

Notes

This class requires the ‘lightgbm’ package to be installed.

fit(X, y)#
predict(X)#
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GBDTRegressor#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class yikit.models.LinearModelRegressor(linear_model='ridge', alpha=1.0, fit_intercept=True, max_iter=1000, tol=0.001, random_state=None)#

Bases: BaseEstimator, RegressorMixin

Linear model regressor supporting Ridge and Lasso regression.

This class provides a scikit-learn compatible wrapper for linear models with regularization. It supports both Ridge (L2) and Lasso (L1) regression.

Parameters:
  • linear_model ({'ridge', 'lasso'}, default='ridge') – Type of linear model to use.

  • alpha (float, default=1.0) – Regularization strength. Higher values mean more regularization.

  • fit_intercept (bool, default=True) – Whether to fit the intercept term.

  • max_iter (int, default=1000) – Maximum number of iterations for the solver.

  • tol (float, default=0.001) – Tolerance for stopping criterion.

  • random_state (int, RandomState instance or None, default=None) – Random state for reproducibility.

estimator_#

The fitted linear model estimator.

Type:

Ridge or Lasso

n_features_in_#

Number of features seen during fit.

Type:

int

n_iter_#

Number of iterations taken by the solver.

Type:

int

rng_#

Random state instance used for reproducibility.

Type:

RandomState

Examples

>>> from yikit.models import LinearModelRegressor
>>> import numpy as np
>>> X = np.random.randn(100, 10)
>>> y = np.random.randn(100)
>>> model = LinearModelRegressor(linear_model='ridge', alpha=1.0)
>>> model.fit(X, y)
>>> predictions = model.predict(X)
fit(X, y)#
predict(X)#
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LinearModelRegressor#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class yikit.models.Objective(estimator, X, y, custom_params=<function Objective.<lambda>>, fixed_params={}, cv=5, random_state=None, scoring=None, n_jobs=None)#

Bases: object

get_best_estimator(study)#
get_best_params(study)#
class yikit.models.ParamDistributions(estimator, custom_params=<function ParamDistributions.<lambda>>, fixed_params={}, random_state=None)#

Bases: dict

Parameter distributions for OptunaSearchCV.

This class provides a dictionary-like interface for parameter distributions compatible with optuna.integration.OptunaSearchCV. It is designed to work similarly to the Objective class but returns BaseDistribution objects instead of using trial.suggest_* methods directly.

Example

>>> from yikit.models import ParamDistributions
>>> from optuna.integration import OptunaSearchCV
>>> from sklearn.ensemble import RandomForestRegressor
>>> param_distributions = ParamDistributions(RandomForestRegressor())
>>> study = optuna.create_study()
>>> optuna_search_cv = OptunaSearchCV(
>>>     estimator=RandomForestRegressor(),
>>>     param_distributions=param_distributions,
>>>     study=study,
>>>     cv=5,
>>>     random_state=42,
>>> )
>>> optuna_search_cv.fit(X, y)
class yikit.models.SupportVectorRegressor(kernel='rbf', gamma='auto', tol=0.01, C=1.0, epsilon=0.1, scale=True)#

Bases: BaseEstimator, RegressorMixin

Support Vector Machine regressor with optional scaling.

This class provides a scikit-learn compatible wrapper for Support Vector Regression (SVR) with built-in feature and target scaling capabilities. Scaling is recommended for SVR as it is sensitive to feature scales.

Parameters:
  • kernel ({'linear', 'poly', 'rbf', 'sigmoid'}, default='rbf') – Kernel type to be used in the algorithm.

  • gamma ({'scale', 'auto'} or float, default='auto') – Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

  • tol (float, default=0.01) – Tolerance for stopping criterion.

  • C (float, default=1.0) – Regularization parameter. The strength of the regularization is inversely proportional to C.

  • epsilon (float, default=0.1) – Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function.

  • scale (bool, default=True) – Whether to scale features and target using StandardScaler. Recommended to keep True for better performance.

estimator_#

The fitted SVR estimator.

Type:

SVR

scaler_X_#

Feature scaler if scale=True, None otherwise.

Type:

StandardScaler or None

scaler_y_#

Target scaler if scale=True, None otherwise.

Type:

StandardScaler or None

n_features_in_#

Number of features seen during fit.

Type:

int

Examples

>>> from yikit.models import SupportVectorRegressor
>>> import numpy as np
>>> X = np.random.randn(100, 10)
>>> y = np.random.randn(100)
>>> model = SupportVectorRegressor(kernel='rbf', C=1.0, scale=True)
>>> model.fit(X, y)
>>> predictions = model.predict(X)
fit(X, y)#
predict(X)#
set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SupportVectorRegressor#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object