statspai.selection¶
selection ¶
Variable selection tools.
stepwise: Stepwise regression with AIC/BIC/p-value criterialasso_select: LASSO-based variable selection with coordinate descent
SelectionResult
dataclass
¶
Result container for variable selection procedures.
Attributes:
| Name | Type | Description |
|---|---|---|
selected |
list[str]
|
Variable names retained in the final model. |
dropped |
list[str]
|
Variable names excluded from the final model. |
history |
DataFrame
|
Step-by-step log of the selection procedure. |
final_model |
dict
|
Summary statistics for the final model (R², adj-R², AIC, BIC, n, k). |
method |
str
|
Selection method used ( |
coefficients |
dict | None
|
Coefficient estimates from the final model (name -> value). |
lasso_path |
dict | None
|
For LASSO: |
lasso_select ¶
lasso_select(data: DataFrame, y: str, x: list[str], method: Literal['cv', 'bic', 'aic'] = 'cv', n_folds: int = 10, n_lambda: int = 100, eps: float = 0.001, max_iter: int = 1000, tol: float = 1e-06, verbose: bool = True, seed: int = 42) -> SelectionResult
LASSO-based variable selection.
Solves min (1/2n)||y - Xβ||² + λ||β||₁ via coordinate descent. Selects λ by K-fold cross-validation, BIC, or AIC.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Dependent variable column. |
required |
x
|
list[str]
|
Candidate independent variables. |
required |
method
|
('cv', 'bic', 'aic')
|
How to choose the regularisation parameter λ. |
"cv"
|
n_folds
|
int
|
Number of cross-validation folds (only for |
10
|
n_lambda
|
int
|
Number of λ values in the grid. |
100
|
eps
|
float
|
Ratio of lambda_min / lambda_max. |
0.001
|
max_iter
|
int
|
Maximum coordinate descent iterations per λ. |
1000
|
tol
|
float
|
Convergence tolerance. |
1e-06
|
verbose
|
bool
|
Print progress. |
True
|
seed
|
int
|
Random seed for CV fold assignment. |
42
|
Returns:
| Type | Description |
|---|---|
SelectionResult
|
|