statspai.experimental¶
experimental ¶
Experimental design and analysis tools.
Provides randomization, balance checking, attrition analysis, and pre-analysis plan generation for RCTs.
RandomizationResult ¶
Results from randomization.
BalanceResult ¶
Results from balance check.
AttritionResult ¶
Results from attrition analysis.
OptimalDesignResult ¶
Results from optimal design calculation.
randomize ¶
randomize(data: DataFrame, n_arms: int = 2, prob: List[float] = None, strata: str = None, cluster: str = None, method: str = 'simple', balance_vars: List[str] = None, n_rerand: int = 0, rerand_threshold: float = 0.001, seed: int = None, treatment_col: str = 'treatment') -> RandomizationResult
Randomize units to treatment and control.
Equivalent to R's randomizr::complete_ra() / block_ra() /
cluster_ra().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data with units to randomize. |
required |
n_arms
|
int
|
Number of treatment arms. |
2
|
prob
|
list of float
|
Probability of each arm. Default: equal. |
None
|
strata
|
str
|
Stratification variable for block randomization. |
None
|
cluster
|
str
|
Cluster variable for cluster randomization. |
None
|
method
|
str
|
'simple', 'complete', 'stratified', 'cluster'. |
'simple'
|
balance_vars
|
list of str
|
Variables to check balance on (for re-randomization). |
None
|
n_rerand
|
int
|
Number of re-randomization iterations (0 = no re-randomization). |
0
|
rerand_threshold
|
float
|
Mahalanobis distance threshold for re-randomization. |
0.001
|
seed
|
int
|
Random seed for reproducibility. |
None
|
treatment_col
|
str
|
Name of treatment column to create. |
'treatment'
|
Returns:
| Type | Description |
|---|---|
RandomizationResult
|
|
Examples:
balance_check ¶
balance_check(data: DataFrame, treatment: str, covariates: List[str], alpha: float = 0.05) -> BalanceResult
Check covariate balance between treatment and control.
Computes normalized differences, t-tests, and omnibus F-test.
Equivalent to Stata's iebaltab and R's cobalt::bal.tab().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
treatment
|
str
|
Binary treatment variable (0/1). |
required |
covariates
|
list of str
|
Covariates to check balance on. |
required |
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
BalanceResult
|
|
Examples:
attrition_test ¶
attrition_test(data: DataFrame, treatment: str, observed: str, covariates: List[str] = None) -> AttritionResult
Test for differential attrition in an RCT.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
treatment
|
str
|
Treatment indicator (0/1). |
required |
observed
|
str
|
Indicator for whether outcome is observed (1) or missing (0). |
required |
covariates
|
list of str
|
Baseline covariates to test as predictors of attrition. |
None
|
Returns:
| Type | Description |
|---|---|
AttritionResult
|
|
Examples:
attrition_bounds ¶
attrition_bounds(data: DataFrame, y: str, treatment: str, observed: str = None, method: str = 'lee', alpha: float = 0.05) -> Dict[str, Any]
Compute bounds on treatment effects under attrition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
treatment
|
str
|
Treatment indicator (0/1). |
required |
observed
|
str
|
Indicator for observed outcome. If None, uses non-missing y. |
None
|
method
|
str
|
Bounding method: 'lee' (Lee 2009), 'manski' (worst-case). |
'lee'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys: 'lower_bound', 'upper_bound', 'naive_ate', 'method', 'n_obs'. |
optimal_design ¶
optimal_design(design: str = 'individual', sigma: float = 1.0, mde: float = None, power: float = 0.8, alpha: float = 0.05, n_arms: int = 2, prop_treat: float = 0.5, icc: float = 0.0, cluster_size: int = None, n_clusters: int = None, cost_per_cluster: float = None, cost_per_unit: float = None, r2: float = 0.0, baseline_mean: float = 0.0) -> OptimalDesignResult
Compute optimal sample size and design parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
design
|
str
|
'individual', 'cluster', 'stratified'. |
'individual'
|
sigma
|
float
|
Standard deviation of the outcome. |
1.0
|
mde
|
float
|
Minimum detectable effect. If None, compute MDE given n. |
None
|
power
|
float
|
Statistical power (1 - Type II error). |
0.8
|
alpha
|
float
|
Significance level. |
0.05
|
n_arms
|
int
|
Number of treatment arms. |
2
|
prop_treat
|
float
|
Proportion assigned to treatment. |
0.5
|
icc
|
float
|
Intra-cluster correlation (for cluster designs). |
0.0
|
cluster_size
|
int
|
Average cluster size. |
None
|
n_clusters
|
int
|
Number of clusters (if fixed). |
None
|
cost_per_cluster
|
float
|
Cost of adding a cluster (for optimal allocation). |
None
|
cost_per_unit
|
float
|
Cost per individual unit. |
None
|
r2
|
float
|
R-squared from baseline covariates (variance reduction). |
0.0
|
baseline_mean
|
float
|
|
0.0
|
Returns:
| Type | Description |
|---|---|
OptimalDesignResult
|
|
Examples: