sparselm.model#
Classes implementing generalized linear regression Regressors.
- class sparselm.model.OrdinaryLeastSquares(fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
CVXRegressor
Ordinary Least Squares Linear Regression.
Regression objective:
\[\min_{\beta} || X \beta - y ||^2_2\]- Parameters:
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs linked above for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (OrdinaryLeastSquares) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (OrdinaryLeastSquares) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.Lasso(alpha=1.0, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
CVXRegressor
Lasso Regressor implemented with cvxpy.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha ||\beta||_1\]- Parameters:
alpha (float) – Regularization hyper-parameter.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.
- class sparselm.model.BestSubsetSelection(groups=None, sparse_bound=100, big_M=100, hierarchy=None, ignore_psd_check=True, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
MIQPl0
MIQP Best Subset Selection Regressor.
Generalized best subset that allows grouping subsets.
- Parameters:
groups (ArrayLike) – array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to. If no grouping is required, simply pass a list of all different numbers, i.e. using range.
sparse_bound (int) – Upper bound on sparsity. The upper bound on total number of nonzero coefficients.
big_M (float) – Upper bound on the norm of coefficients associated with each groups of coefficients \(||\beta_c||_2\).
hierarchy (list) – A list of lists of integers storing hierarchy relations between coefficients. Each sublist contains indices of other coefficients on which the coefficient associated with each element of the list depends. i.e. hierarchy = [[1, 2], [0], []] mean that coefficient 0 depends on 1, and 2; 1 depends on 0, and 2 has no dependence.
ignore_psd_check (bool) – Whether to ignore cvxpy’s PSD checks of matrix used in quadratic form. Default is True to avoid raising errors for poorly conditioned matrices. But if you want to be strict set to False.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
Notes
Installation of Gurobi is not a must, but highly recommended. An open source alternative is SCIP. ECOS_BB also works but can be very slow, and has recurring correctness issues. See the Mixed-integer programs section of the cvxpy docs: https://www.cvxpy.org/tutorial/advanced/index.html
Warning
Even with gurobi solver, this can take a very long time to converge for large problems and under-determined problems.
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (BestSubsetSelection) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (BestSubsetSelection) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.RidgedBestSubsetSelection(groups=None, sparse_bound=100, eta=1.0, big_M=100, hierarchy=None, tikhonov_w=None, ignore_psd_check=True, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
TikhonovMixin
,BestSubsetSelection
MIQP best subset selection Regressor with Ridge/Tihkonov regularization.
- Parameters:
groups (ArrayLike) – array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to. If no grouping is required, simply pass a list of all different numbers, i.e. using range.
sparse_bound (int) – Upper bound on sparsity. The upper bound on total number of nonzero coefficients.
eta (float) – L2 regularization hyper-parameter.
big_M (float) – Upper bound on the norm of coefficients associated with each groups of coefficients \(||\beta_c||_2\).
hierarchy (list) – A list of lists of integers storing hierarchy relations between coefficients. Each sublist contains indices of other coefficients on which the coefficient associated with each element of the list depends. i.e. hierarchy = [[1, 2], [0], []] mean that coefficient 0 depends on 1, and 2; 1 depends on 0, and 2 has no dependence.
tikhonov_w (np.array) – Matrix to add weights to L2 regularization.
ignore_psd_check (bool) – Whether to ignore cvxpy’s PSD checks of matrix used in quadratic form. Default is True to avoid raising errors for poorly conditioned matrices. But if you want to be strict set to False.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
Notes
Installation of Gurobi is not a must, but highly recommended. An open source alternative is SCIP. ECOS_BB also works but can be very slow, and has recurring correctness issues. See the Mixed-integer programs section of the cvxpy docs: https://www.cvxpy.org/tutorial/advanced/index.html
Warning
Even with gurobi solver, this can take a very long time to converge for large problems and under-determined problems.
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (RidgedBestSubsetSelection) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (RidgedBestSubsetSelection) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.RegularizedL0(groups=None, alpha=1.0, big_M=100, hierarchy=None, ignore_psd_check=True, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
MIQPl0
Implementation of mixed-integer quadratic programming l0 regularized Regressor.
Supports grouping parameters and group-level hierarchy, but requires groups as a compulsory argument.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha \sum_{G} z_G\]Where G represents groups of features/coefficients and \(z_G\) is are boolean valued slack variables.
- Parameters:
groups (ArrayLike) – 1D array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to. If no grouping is needed pass a list of all distinct numbers (ie range(len(coefs)) to create singleton groups for each parameter.
alpha (float) – L0 pseudo-norm regularization hyper-parameter.
big_M (float) – Upper bound on the norm of coefficients associated with each groups of coefficients \(||\beta_c||_2\).
hierarchy (list) – A list of lists of integers storing hierarchy relations between groups. Each sublist contains indices of other groups on which the group associated with each element of the list depends. i.e. hierarchy = [[1, 2], [0], []] mean that group 0 depends on 1, and 2; 1 depends on 0, and 2 has no dependence.
ignore_psd_check (bool) – Whether to ignore cvxpy’s PSD checks of matrix used in quadratic form. Default is True to avoid raising errors for poorly conditioned matrices. But if you want to be strict set to False.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
Notes
Installation of Gurobi is not a must, but highly recommended. An open source alternative is SCIP. ECOS_BB also works but can be very slow, and has recurring correctness issues. See the Mixed-integer programs section of the cvxpy docs: https://www.cvxpy.org/tutorial/advanced/index.html
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (RegularizedL0) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (RegularizedL0) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.L1L0(groups=None, alpha=1.0, eta=1.0, big_M=100, hierarchy=None, ignore_psd_check=True, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
MixedL0
L1L0 regularized Regressor.
Regressor with L1L0 regularization solved with mixed integer programming as discussed in:
https://arxiv.org/abs/1807.10753
Extended to allow grouping of coefficients and group-level hierarchy as described in:
https://doi.org/10.1287/opre.2015.1436
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha \sum_{G} z_G + \eta ||\beta||_1\]Where G represents groups of features/coefficients and \(z_G\) is are boolean valued slack variables.
- Parameters:
groups (ArrayLike) – 1D array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to. If no grouping is needed pass a list of all distinct numbers (ie range(len(coefs)) to create singleton groups for each parameter.
alpha (float) – L0 pseudo-norm regularization hyper-parameter.
eta (float) – L1 regularization hyper-parameter.
big_M (float) – Upper bound on the norm of coefficients associated with each groups of coefficients \(||\beta_c||_2\).
hierarchy (list) – A list of lists of integers storing hierarchy relations between coefficients. Each sublist contains indices of other coefficients on which the coefficient associated with each element of the list depends. i.e. hierarchy = [[1, 2], [0], []] mean that coefficient 0 depends on 1, and 2; 1 depends on 0, and 2 has no dependence.
ignore_psd_check (bool) – Whether to ignore cvxpy’s PSD checks of matrix used in quadratic form. Default is True to avoid raising errors for poorly conditioned matrices. But if you want to be strict set to False.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
Notes
Installation of Gurobi is not a must, but highly recommended. An open source alternative is SCIP. ECOS_BB also works but can be very slow, and has recurring correctness issues. See the Mixed-integer programs section of the cvxpy docs: https://www.cvxpy.org/tutorial/advanced/index.html
Initialize Regressor.
- Parameters:
groups (ArrayLike) – 1D array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to. If no grouping is needed pass a list of all distinct numbers (ie range(len(coefs)) to create singleton groups for each parameter.
alpha (float) – L0 pseudo-norm regularization hyper-parameter.
eta (float) – standard norm regularization hyper-parameter (usually l1 or l2).
big_M (float) – Upper bound on the norm of coefficients associated with each
hierarchy (list) – A list of lists of integers storing hierarchy relations between coefficients. Each sublist contains indices of other coefficients on which the coefficient associated with each element of the list depends. i.e. hierarchy = [[1, 2], [0], []] mean that coefficient 0 depends on 1, and 2; 1 depends on 0, and 2 has no dependence.
ignore_psd_check (bool) – Whether to ignore cvxpy’s PSD checks of matrix used in quadratic form. Default is True to avoid raising errors for poorly conditioned matrices. But if you want to be strict set to False.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.
- class sparselm.model.L2L0(groups=None, alpha=1.0, eta=1.0, big_M=100, hierarchy=None, tikhonov_w=None, ignore_psd_check=True, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
TikhonovMixin
,MixedL0
L2L0 regularized Regressor.
Based on Regressor with L2L0 regularization solved with mixed integer programming proposed in:
https://arxiv.org/abs/2204.13789
Extended to allow grouping of coefficients and group-level hierarchy as described in:
https://doi.org/10.1287/opre.2015.1436
And allows using a Tihkonov matrix in the l2 term.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha \sum_{G} z_G + \eta ||W\beta||^2_2\]Where G represents groups of features/coefficients and \(z_G\) is are boolean valued slack variables. W is a Tikhonov matrix.
- Parameters:
groups (ArrayLike) – 1D array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to. If no grouping is needed pass a list of all distinct numbers (ie range(len(coefs)) to create singleton groups for each parameter.
alpha (float) – L0 pseudo-norm regularization hyper-parameter.
eta (float) – L2 regularization hyper-parameter.
big_M (float) – Upper bound on the norm of coefficients associated with each groups of coefficients \(||\beta_c||_2\).
hierarchy (list) – A list of lists of integers storing hierarchy relations between coefficients. Each sublist contains indices of other coefficients on which the coefficient associated with each element of the list depends. i.e. hierarchy = [[1, 2], [0], []] mean that coefficient 0 depends on 1, and 2; 1 depends on 0, and 2 has no dependence.
tikhonov_w (np.array) – Matrix to add weights to L2 regularization.
ignore_psd_check (bool) – Wether to ignore cvxpy’s PSD checks of matrix used in quadratic form. Default is True to avoid raising errors for poorly conditioned matrices. But if you want to be strict set to False.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXEstimator for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
Notes
Installation of Gurobi is not a must, but highly recommended. An open source alternative is SCIP. ECOS_BB also works but can be very slow, and has recurring correctness issues. See the Mixed-integer programs section of the cvxpy docs: https://www.cvxpy.org/tutorial/advanced/index.html
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.
- class sparselm.model.GroupLasso(groups=None, alpha=1.0, group_weights=None, standardize=False, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
Lasso
Group Lasso implementation.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha \sum_{G} w_G ||\beta_G||_2\]Where G represents groups of features/coefficients.
- Parameters:
groups (ArrayLike) – array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to.
alpha (float) – Regularization hyper-parameter.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
group_weights (ArrayLike) – optional Weights for each group to use in the regularization term. The default is to use the sqrt of the group sizes, however any weight can be specified. The array must be the same length as the groups given. If you need all groups weighted equally just pass an array of ones.
standardize (bool) – optional Whether to standardize the group regularization penalty using the feature matrix. See the following for reference: http://faculty.washington.edu/nrsimon/standGL.pdf
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (GroupLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (GroupLasso) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.OverlapGroupLasso(group_list=None, alpha=1.0, group_weights=None, standardize=False, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None, **kwargs)[source]#
Bases:
GroupLasso
Overlap Group Lasso implementation.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha \sum_{G} w_G ||\beta_G||_2\]Where G represents groups of features/coefficients, and overlapping groups are acceptable. Meaning a coefficients can be in more than one group.
- Parameters:
group_list (list of lists of int) – list of lists of integers specifying groups. The length of the list holding lists should be the same as model. Each inner list has integers specifying the groups the coefficient for that index belongs to. i.e. [[1,2],[2,3],[1,2,3]] means the first coefficient belongs to group 1 and 2, the second to 2, and 3 and the third to 1, 2 and 3. In other words the 3 groups would be: (0, 2), (0, 1, 2), (1, 2)
alpha (float) – Regularization hyper-parameter.
group_weights (ArrayLike) – optional Weights for each group to use in the regularization term. The default is to use the sqrt of the group sizes, however any weight can be specified. The array must be the same length as the number of different groups given. If you need all groups weighted equally just pass an array of ones.
standardize (bool) – optional Whether to standardize the group regularization penalty using the feature matrix. See the following for reference: http://faculty.washington.edu/nrsimon/standGL.pdf
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- generate_problem(X, y, preprocess_data=True, sample_weight=None)[source]#
Initialize cvxpy problem from the generated objective function.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (OverlapGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (OverlapGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.SparseGroupLasso(groups=None, l1_ratio=0.5, alpha=1.0, group_weights=None, standardize=False, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
GroupLasso
Sparse Group Lasso.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha r ||\beta||_1 + \alpha (1 - r) * \sum_{G}||\beta_G||_2\]Where G represents groups of features / coefficients. And r is the L1 ratio.
- Parameters:
groups (ArrayLike) – array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to.
alpha (float) – Regularization hyper-parameter.
l1_ratio (float) – Mixing parameter between l1 and group lasso regularization.
ArrayLike (group_weights) – optional Weights for each group to use in the regularization term. The default is to use the sqrt of the group sizes, however any weight can be specified. The array must be the same length as the groups given. If you need all groups weighted equally just pass an array of ones.
standardize (bool) – optional Whether to standardize the group regularization penalty using the feature matrix. See the following for reference: http://faculty.washington.edu/nrsimon/standGL.pdf
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
group_weights (ArrayLike | None) –
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (SparseGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (SparseGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.RidgedGroupLasso(groups=None, alpha=1.0, delta=(1.0,), group_weights=None, standardize=False, fit_intercept=False, copy_X=True, warm_start=False, solver=None, solver_options=None)[source]#
Bases:
GroupLasso
Ridged Group Lasso implementation.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha \sum_{G} w_G ||\beta_G||_2 + \sum_{G} \delta_l ||\beta_G||^2_2\]Where G represents groups of features/coefficients
For details on proper standardization refer to: http://faculty.washington.edu/nrsimon/standGL.pdf
- Parameters:
groups (ArrayLike) – array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to.
alpha (float) – Regularization hyper-parameter.
delta (ArrayLike) – optional Positive 1D array. Regularization vector for ridge penalty. The array must be of the same lenght as the number of groups, or length 1 if all groups are ment to have the same ridge hyperparamter.
group_weights (ArrayLike) – optional Weights for each group to use in the regularization term. The default is to use the sqrt of the group sizes, however any weight can be specified. The array must be the same length as the groups given. If you need all groups weighted equally just pass an array of ones.
standardize (bool) – optional Whether to standardize the group regularization penalty using the feature matrix. See the following for reference: http://faculty.washington.edu/nrsimon/standGL.pdf
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (RidgedGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (RidgedGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.AdaptiveLasso(alpha=1.0, max_iter=3, eps=1e-06, tol=1e-10, update_function=None, fit_intercept=False, copy_X=True, warm_start=True, solver=None, solver_options=None, **kwargs)[source]#
Bases:
Lasso
Adaptive Lasso implementation.
Also known as iteratively re-weighted Lasso.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha ||w^T \beta||_1\]Where w represents a vector of weights that is iteratively updated.
- Parameters:
alpha (float) – Regularization hyper-parameter.
max_iter (int) – Maximum number of re-weighting iteration steps.
eps (float) – Value to add to denominator of weights.
tol (float) – Absolute convergence tolerance for difference between weights at successive steps.
update_function (Callable) – optional A function with signature f(beta, eps) used to update the weights at each iteration. Default is 1/(|beta| + eps)
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (AdaptiveLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (AdaptiveLasso) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.AdaptiveGroupLasso(groups=None, alpha=1.0, group_weights=None, max_iter=3, eps=1e-06, tol=1e-10, update_function=None, standardize=False, fit_intercept=False, copy_X=True, warm_start=True, solver=None, solver_options=None, **kwargs)[source]#
Bases:
AdaptiveLasso
,GroupLasso
Adaptive Group Lasso, iteratively re-weighted group lasso.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha * \sum_{G} w_G ||\beta_G||_2\]Where w represents a vector of weights that is iteratively updated.
- Parameters:
groups (list or ndarray) – array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to.
alpha (float) – Regularization hyper-parameter.
group_weights (ndarray) – optional Weights for each group to use in the regularization term. The default is to use the sqrt of the group sizes, however any weight can be specified. The array must be the same length as the groups given. If you need all groups weighted equally just pass an array of ones.
max_iter (int) – Maximum number of re-weighting iteration steps.
eps (float) – Value to add to denominator of weights.
tol (float) – Absolute convergence tolerance for difference between weights at successive steps.
update_function (Callable) – optional A function with signature f(group_norms, eps) used to update the weights at each iteration. Where group_norms are the norms of the coefficients Beta for each group. Default is 1/(group_norms + eps)
standardize (bool) – optional Whether to standardize the group regularization penalty using the feature matrix. See the following for reference: http://faculty.washington.edu/nrsimon/standGL.pdf
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (AdaptiveGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (AdaptiveGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.AdaptiveOverlapGroupLasso(group_list=None, alpha=1.0, group_weights=None, max_iter=3, eps=1e-06, tol=1e-10, update_function=None, standardize=False, fit_intercept=False, copy_X=True, warm_start=True, solver=None, solver_options=None)[source]#
Bases:
OverlapGroupLasso
,AdaptiveGroupLasso
Adaptive Overlap Group Lasso implementation.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha \sum_{G} w_G ||\beta_G||_2\]Where G represents groups of features/coefficients, and overlapping groups are acceptable. Meaning a coefficients can be in more than one group.
- Parameters:
group_list (list of lists) – list of lists of integers specifying groups. The length of the list holding lists should be the same as model. Each inner list has integers specifying the groups the coefficient for that index belongs to. i.e. [[1,2],[2,3],[1,2,3]] means the first coefficient belongs to group 1 and 2, the second to 2, and 3 and the third to 1, 2 and 3. In other words the 3 groups would be: (0, 2), (0, 1, 2), (1, 2)
alpha (float) – Regularization hyper-parameter.
group_weights (ndarray) – optional Weights for each group to use in the regularization term. The default is to use the sqrt of the group sizes, however any weight can be specified. The array must be the same length as the number of different groups given. If you need all groups weighted equally just pass an array of ones.
max_iter (int) – Maximum number of re-weighting iteration steps.
eps (float) – Value to add to denominator of weights.
tol (float) – Absolute convergence tolerance for difference between weights at successive steps.
update_function (Callable) – optional A function with signature f(group_norms, eps) used to update the weights at each iteration. Where group_norms are the norms of the coefficients Beta for each group. Default is 1/(group_norms + eps)
standardize (bool) – optional Whether to standardize the group regularization penalty using the feature matrix. See the following for reference: http://faculty.washington.edu/nrsimon/standGL.pdf
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Initialize cvxpy problem from the generated objective function.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (AdaptiveOverlapGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (AdaptiveOverlapGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.AdaptiveSparseGroupLasso(groups=None, l1_ratio=0.5, alpha=1.0, group_weights=None, max_iter=3, eps=1e-06, tol=1e-10, update_function=None, standardize=False, fit_intercept=False, copy_X=True, warm_start=True, solver=None, solver_options=None)[source]#
Bases:
AdaptiveLasso
,SparseGroupLasso
Adaptive Sparse Group Lasso, iteratively re-weighted sparse group lasso.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha r ||w^T \beta||_1 + \alpha (1 - r) \sum_{G} v_G ||\beta_G||_2\]Where w, v represent vectors of weights that are iteratively updated. And r is the L1 ratio.
- Parameters:
groups (list or ndarray) – array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to.
l1_ratio (float) – Mixing parameter between l1 and group lasso regularization.
alpha (float) – Regularization hyper-parameter.
group_weights (ndarray) – optional Weights for each group to use in the regularization term. The default is to use the sqrt of the group sizes, however any weight can be specified. The array must be the same length as the groups given. If you need all groups weighted equally just pass an array of ones.
max_iter (int) – Maximum number of re-weighting iteration steps.
eps (float) – Value to add to denominator of weights.
tol (float) – Absolute convergence tolerance for difference between weights at successive steps.
update_function (Callable) – optional A function with signature f(group_norms, eps) used to update the weights at each iteration. Where group_norms are the norms of the coefficients Beta for each group. Default is 1/(group_norms + eps)
standardize (bool) – optional Whether to standardize the group regularization penalty using the feature matrix. See the following for reference: http://faculty.washington.edu/nrsimon/standGL.pdf
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (AdaptiveSparseGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (AdaptiveSparseGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- class sparselm.model.AdaptiveRidgedGroupLasso(groups=None, alpha=1.0, delta=(1.0,), group_weights=None, max_iter=3, eps=1e-06, tol=1e-10, update_function=None, standardize=False, fit_intercept=False, copy_X=True, warm_start=True, solver=None, solver_options=None)[source]#
Bases:
AdaptiveGroupLasso
,RidgedGroupLasso
Adaptive Ridged Group Lasso implementation.
Regularized regression objective:
\[\min_{\beta} || X \beta - y ||^2_2 + \alpha \sum_{G} w_G ||\beta_G||_2 + \sum_{G} w_l ||\beta_G||^2_2\]Where G represents groups of features/coefficients, and w_l represents a vector of weights that are updated iteratively.
For details on proper standardization refer to: http://faculty.washington.edu/nrsimon/standGL.pdf
Adaptive iterative weights are only done on the group norm and not the ridge portion.
- Parameters:
groups (list or ndarray) – array-like of integers specifying groups. Length should be the same as model, where each integer entry specifies the group each parameter corresponds to.
alpha (float) – Regularization hyper-parameter.
delta (ndarray) – optional Positive 1D array. Regularization vector for ridge penalty.
group_weights (ndarray) – optional Weights for each group to use in the regularization term. The default is to use the sqrt of the group sizes, however any weight can be specified. The array must be the same length as the groups given. If you need all groups weighted equally just pass an array of ones.
fit_intercept (bool) – Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.
max_iter (int) – Maximum number of re-weighting iteration steps.
eps (float) – Value to add to denominator of weights.
tol (float) – Absolute convergence tolerance for difference between weights at successive steps.
update_function (Callable) – optional A function with signature f(group_norms, eps) used to update the weights at each iteration. Where group_norms are the norms of the coefficients Beta for each group. Default is 1/(group_norms + eps)
copy_X (bool) – If True, X will be copied; else, it may be overwritten.
warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.
solver (str) – cvxpy backend solver to use. Supported solvers are listed here: https://www.cvxpy.org/tutorial/advanced/index.html#solve-method-options
solver_options (dict) – dictionary of keyword arguments passed to cvxpy solve. See docs in CVXRegressor for more information.
standardize (bool) –
- Variables:
coef (NDArray) – Parameter vector (\(\beta\) in the cost function formula) of shape (n_features,).
intercept (float) – Independent term in decision function.
canonicals (SimpleNamespace) –
Namespace that contains underlying cvxpy objects used to define the optimization problem. The objects included are the following:
objective - the objective function.
beta - variable to be optimized (corresponds to the estimated coef_ attribute).
parameters - hyper-parameters
auxiliaries - auxiliary variables and expressions
constraints - solution constraints
- add_constraints(constraints)#
Add a constraint to the problem.
Warning
Adding constraints will not work with any sklearn class that relies on cloning the estimator (ie GridSearchCV, etc) . This is because a new cvxpy problem is generated for any cloned estimator.
- Parameters:
constraints (list of cp.constraint or cp.expressions) – cvxpy constraint to add to the problem
- Return type:
None
- fit(X, y, sample_weight=None, *args, **kwargs)#
Fit the linear model coefficients.
Prepares the fit data input, generates cvxpy objects to represent the minimization objective, and solves the regression problem using the given solver.
- Parameters:
X (ArrayLike) – Training data of shape (n_samples, n_features).
y (ArrayLike) – Target values. Will be cast to X’s dtype if necessary of shape (n_samples,) or (n_samples, n_targets)
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None
*args – Positional arguments passed to solve method
**kwargs – Keyword arguments passed to solve method
- Returns:
instance of self
- generate_problem(X, y, preprocess_data=True, sample_weight=None)#
Generate regression problem and auxiliary cvxpy objects.
This initializes the minimization problem, the objective, coefficient variable (beta), problem parameters, solution constraints, and auxiliary variables/terms.
This is (almost always) called in the fit method, and not directly. However, it can be called directly if further control over the problem is needed by accessing the canonicals_ objects. For example to add additional constraints on problem variables.
- Parameters:
X (ArrayLike) – Covariate/Feature matrix
y (ArrayLike) – Target vector
preprocess_data (bool) – Whether to preprocess the data before generating the problem. If calling generate_problem directly, this should be kept as True to ensure the problem is generated correctly for a subsequent call to fit.
sample_weight (ArrayLike) – Individual weights for each sample of shape (n_samples,) default=None. Only used if preprocess_data=True to rescale the data accordingly.
- Return type:
None
- get_metadata_routing()#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)#
Get parameters for this estimator.
- predict(X)#
Predict using the linear model.
- Parameters:
X (array-like or sparse matrix, shape (n_samples, n_features)) – Samples.
- Returns:
C – Returns predicted values.
- Return type:
array, shape (n_samples,)
- score(X, y, sample_weight=None)#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – \(R^2\) of
self.predict(X)
w.r.t. y.- Return type:
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_fit_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter infit
.self (AdaptiveRidgedGroupLasso) –
- Returns:
self – The updated object.
- Return type:
- set_params(**params)#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight='$UNCHANGED$')#
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
pipeline.Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.self (AdaptiveRidgedGroupLasso) –
- Returns:
self – The updated object.
- Return type: