Skip to content

statspai.selection

selection

Variable selection tools.

  • stepwise: Stepwise regression with AIC/BIC/p-value criteria
  • lasso_select: LASSO-based variable selection with coordinate descent

SelectionResult dataclass

Result container for variable selection procedures.

Attributes:

Name Type Description
selected list[str]

Variable names retained in the final model.

dropped list[str]

Variable names excluded from the final model.

history DataFrame

Step-by-step log of the selection procedure.

final_model dict

Summary statistics for the final model (R², adj-R², AIC, BIC, n, k).

method str

Selection method used ("forward", "backward", "both", "lasso_cv", "lasso_bic", "lasso_aic").

coefficients dict | None

Coefficient estimates from the final model (name -> value).

lasso_path dict | None

For LASSO: {"lambdas": array, "coef_paths": dict} storing the regularisation path.

summary

summary() -> str

Return a human-readable summary and print it.

plot

plot(figsize=(10, 6))

Plot selection diagnostics.

For stepwise: criterion value at each step. For LASSO: coefficient path across lambda values.

Returns:

Type Description
(fig, ax) : matplotlib Figure and Axes

lasso_select

lasso_select(data: DataFrame, y: str, x: list[str], method: Literal['cv', 'bic', 'aic'] = 'cv', n_folds: int = 10, n_lambda: int = 100, eps: float = 0.001, max_iter: int = 1000, tol: float = 1e-06, verbose: bool = True, seed: int = 42) -> SelectionResult

LASSO-based variable selection.

Solves min (1/2n)||y - Xβ||² + λ||β||₁ via coordinate descent. Selects λ by K-fold cross-validation, BIC, or AIC.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Dependent variable column.

required
x list[str]

Candidate independent variables.

required
method ('cv', 'bic', 'aic')

How to choose the regularisation parameter λ.

"cv"
n_folds int

Number of cross-validation folds (only for method="cv").

10
n_lambda int

Number of λ values in the grid.

100
eps float

Ratio of lambda_min / lambda_max.

0.001
max_iter int

Maximum coordinate descent iterations per λ.

1000
tol float

Convergence tolerance.

1e-06
verbose bool

Print progress.

True
seed int

Random seed for CV fold assignment.

42

Returns:

Type Description
SelectionResult