yikit.models package#
Machine learning model wrappers and ensemble methods.
This module provides scikit-learn compatible regressors including ensemble methods, linear models, support vector machines, gradient boosting, neural networks, and hyperparameter optimization utilities.
- class yikit.models.EnsembleRegressor(estimators=(RandomForestRegressor(),), method='blending', cv=5, n_jobs=-1, random_state=None, scoring='neg_mean_squared_error', verbose=0, boruta=True, opt=True)#
Bases:
BaseEstimator,RegressorMixinEnsemble regressor combining multiple base estimators.
This class provides ensemble methods for regression including blending, averaging, and stacking. It supports optional feature selection using Boruta and hyperparameter optimization using Optuna.
- Parameters:
estimators (list of estimator objects, default=(RandomForestRegressor(),)) – List of base estimators to ensemble. Each estimator must be a scikit-learn compatible regressor.
method ({'blending', 'average', 'stacking'}, default='blending') – Ensemble method to use: - ‘blending’: Train each estimator on a different fold and combine predictions - ‘average’: Simple average of predictions from all estimators - ‘stacking’: Use a meta-learner to combine predictions
cv (int, cross-validation generator or iterable, default=5) – Determines the cross-validation splitting strategy.
n_jobs (int, default=-1) – Number of jobs to run in parallel.
random_state (int, RandomState instance or None, default=None) – Random state for reproducibility.
scoring (str, callable or list, default='neg_mean_squared_error') – Scoring metric(s) to use. See scikit-learn documentation for options.
verbose (int, default=0) – Verbosity level.
boruta (bool, default=True) – Whether to perform Boruta feature selection before training.
opt (bool, default=True) – Whether to perform hyperparameter optimization using Optuna.
- estimators_#
The fitted base estimators.
- Type:
list of fitted estimators
- meta_estimator_#
The meta-learner used in stacking (None for other methods).
- Type:
estimator or None
- n_features_in_#
Number of features seen during fit.
- Type:
int
- feature_names_in_#
Names of features seen during fit.
- Type:
ndarray of shape (n_features_in_,)
Examples
>>> from yikit.models import EnsembleRegressor >>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.linear_model import Ridge >>> import numpy as np >>> X = np.random.randn(100, 10) >>> y = np.random.randn(100) >>> ensemble = EnsembleRegressor( ... estimators=[RandomForestRegressor(), Ridge()], ... method='blending', ... cv=5 ... ) >>> ensemble.fit(X, y) >>> predictions = ensemble.predict(X)
- fit(X, y)#
- predict(X)#
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') EnsembleRegressor#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class yikit.models.GBDTRegressor(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, subsample_for_bin=200000, objective=None, class_weight=None, min_split_gain=0.0, min_child_weight=0.001, min_child_samples=20, subsample=1.0, subsample_freq=0, colsample_bytree=1.0, reg_alpha=0.0, reg_lambda=0.0, random_state=None, n_jobs=-1, silent=True, importance_type='split', **kwargs)#
Bases:
RegressorMixin,BaseEstimatorGradient Boosting Decision Tree regressor using LightGBM.
This class provides a scikit-learn compatible wrapper for LightGBM’s gradient boosting decision tree regressor. It includes automatic early stopping using a validation set and supports all LightGBM parameters.
- Parameters:
boosting_type (str, default='gbdt') – Type of boosting algorithm to use.
num_leaves (int, default=31) – Maximum tree leaves for base learners.
max_depth (int, default=-1) – Maximum tree depth for base learners, <=0 means no limit.
learning_rate (float, default=0.1) – Boosting learning rate.
n_estimators (int, default=100) – Number of boosted trees to fit.
subsample_for_bin (int, default=200000) – Number of samples for constructing bins.
objective (str or None, default=None) – Specify the learning task and the corresponding learning objective.
class_weight (dict, 'balanced' or None, default=None) – Weights associated with classes.
min_split_gain (float, default=0.0) – Minimum loss reduction required to make a further partition.
min_child_weight (float, default=0.001) – Minimum sum of instance weight (hessian) needed in a child.
min_child_samples (int, default=20) – Minimum number of data needed in a child (leaf).
subsample (float, default=1.0) – Subsample ratio of the training instance.
subsample_freq (int, default=0) – Frequency of subsample, <=0 means no enable.
colsample_bytree (float, default=1.0) – Subsample ratio of columns when constructing each tree.
reg_alpha (float, default=0.0) – L1 regularization term on weights.
reg_lambda (float, default=0.0) – L2 regularization term on weights.
random_state (int, RandomState instance or None, default=None) – Random state for reproducibility.
n_jobs (int, default=-1) – Number of parallel threads.
silent (bool, default=True) – Whether to print messages while running boosting.
importance_type (str, default='split') – The type of feature importance to be filled in feature_importances_.
**kwargs – Additional keyword arguments passed to LGBMRegressor.
- estimator_#
The fitted LightGBM regressor.
- Type:
LGBMRegressor
- feature_importances_#
The feature importances.
- Type:
array-like of shape (n_features,)
- n_features_in_#
Number of features seen during fit.
- Type:
int
- rng_#
Random state instance used for reproducibility.
- Type:
RandomState
Examples
>>> from yikit.models import GBDTRegressor >>> import numpy as np >>> X = np.random.randn(100, 10) >>> y = np.random.randn(100) >>> model = GBDTRegressor(n_estimators=100, learning_rate=0.1) >>> model.fit(X, y) >>> predictions = model.predict(X)
Notes
This class requires the ‘lightgbm’ package to be installed.
- fit(X, y)#
- predict(X)#
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GBDTRegressor#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class yikit.models.LinearModelRegressor(linear_model='ridge', alpha=1.0, fit_intercept=True, max_iter=1000, tol=0.001, random_state=None)#
Bases:
BaseEstimator,RegressorMixinLinear model regressor supporting Ridge and Lasso regression.
This class provides a scikit-learn compatible wrapper for linear models with regularization. It supports both Ridge (L2) and Lasso (L1) regression.
- Parameters:
linear_model ({'ridge', 'lasso'}, default='ridge') – Type of linear model to use.
alpha (float, default=1.0) – Regularization strength. Higher values mean more regularization.
fit_intercept (bool, default=True) – Whether to fit the intercept term.
max_iter (int, default=1000) – Maximum number of iterations for the solver.
tol (float, default=0.001) – Tolerance for stopping criterion.
random_state (int, RandomState instance or None, default=None) – Random state for reproducibility.
- estimator_#
The fitted linear model estimator.
- Type:
Ridge or Lasso
- n_features_in_#
Number of features seen during fit.
- Type:
int
- n_iter_#
Number of iterations taken by the solver.
- Type:
int
- rng_#
Random state instance used for reproducibility.
- Type:
RandomState
Examples
>>> from yikit.models import LinearModelRegressor >>> import numpy as np >>> X = np.random.randn(100, 10) >>> y = np.random.randn(100) >>> model = LinearModelRegressor(linear_model='ridge', alpha=1.0) >>> model.fit(X, y) >>> predictions = model.predict(X)
- fit(X, y)#
- predict(X)#
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LinearModelRegressor#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class yikit.models.Objective(estimator, X, y, custom_params=<function Objective.<lambda>>, fixed_params={}, cv=5, random_state=None, scoring=None, n_jobs=None)#
Bases:
object- get_best_estimator(study)#
- get_best_params(study)#
- class yikit.models.ParamDistributions(estimator, custom_params=<function ParamDistributions.<lambda>>, fixed_params={}, random_state=None)#
Bases:
dictParameter distributions for OptunaSearchCV.
This class provides a dictionary-like interface for parameter distributions compatible with optuna.integration.OptunaSearchCV. It is designed to work similarly to the Objective class but returns BaseDistribution objects instead of using trial.suggest_* methods directly.
Example
>>> from yikit.models import ParamDistributions >>> from optuna.integration import OptunaSearchCV >>> from sklearn.ensemble import RandomForestRegressor >>> param_distributions = ParamDistributions(RandomForestRegressor()) >>> study = optuna.create_study() >>> optuna_search_cv = OptunaSearchCV( >>> estimator=RandomForestRegressor(), >>> param_distributions=param_distributions, >>> study=study, >>> cv=5, >>> random_state=42, >>> ) >>> optuna_search_cv.fit(X, y)
- class yikit.models.SupportVectorRegressor(kernel='rbf', gamma='auto', tol=0.01, C=1.0, epsilon=0.1, scale=True)#
Bases:
BaseEstimator,RegressorMixinSupport Vector Machine regressor with optional scaling.
This class provides a scikit-learn compatible wrapper for Support Vector Regression (SVR) with built-in feature and target scaling capabilities. Scaling is recommended for SVR as it is sensitive to feature scales.
- Parameters:
kernel ({'linear', 'poly', 'rbf', 'sigmoid'}, default='rbf') – Kernel type to be used in the algorithm.
gamma ({'scale', 'auto'} or float, default='auto') – Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
tol (float, default=0.01) – Tolerance for stopping criterion.
C (float, default=1.0) – Regularization parameter. The strength of the regularization is inversely proportional to C.
epsilon (float, default=0.1) – Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function.
scale (bool, default=True) – Whether to scale features and target using StandardScaler. Recommended to keep True for better performance.
- estimator_#
The fitted SVR estimator.
- Type:
SVR
- scaler_X_#
Feature scaler if scale=True, None otherwise.
- Type:
StandardScaler or None
- scaler_y_#
Target scaler if scale=True, None otherwise.
- Type:
StandardScaler or None
- n_features_in_#
Number of features seen during fit.
- Type:
int
Examples
>>> from yikit.models import SupportVectorRegressor >>> import numpy as np >>> X = np.random.randn(100, 10) >>> y = np.random.randn(100) >>> model = SupportVectorRegressor(kernel='rbf', C=1.0, scale=True) >>> model.fit(X, y) >>> predictions = model.predict(X)
- fit(X, y)#
- predict(X)#
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SupportVectorRegressor#
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object