Skip to content

statspai.experimental

experimental

Experimental design and analysis tools.

Provides randomization, balance checking, attrition analysis, and pre-analysis plan generation for RCTs.

RandomizationResult

Results from randomization.

BalanceResult

Results from balance check.

plot

plot(ax=None, **kwargs)

Love plot of normalized differences.

AttritionResult

Results from attrition analysis.

OptimalDesignResult

Results from optimal design calculation.

randomize

randomize(data: DataFrame, n_arms: int = 2, prob: List[float] = None, strata: str = None, cluster: str = None, method: str = 'simple', balance_vars: List[str] = None, n_rerand: int = 0, rerand_threshold: float = 0.001, seed: int = None, treatment_col: str = 'treatment') -> RandomizationResult

Randomize units to treatment and control.

Equivalent to R's randomizr::complete_ra() / block_ra() / cluster_ra().

Parameters:

Name Type Description Default
data DataFrame

Data with units to randomize.

required
n_arms int

Number of treatment arms.

2
prob list of float

Probability of each arm. Default: equal.

None
strata str

Stratification variable for block randomization.

None
cluster str

Cluster variable for cluster randomization.

None
method str

'simple', 'complete', 'stratified', 'cluster'.

'simple'
balance_vars list of str

Variables to check balance on (for re-randomization).

None
n_rerand int

Number of re-randomization iterations (0 = no re-randomization).

0
rerand_threshold float

Mahalanobis distance threshold for re-randomization.

0.001
seed int

Random seed for reproducibility.

None
treatment_col str

Name of treatment column to create.

'treatment'

Returns:

Type Description
RandomizationResult

Examples:

>>> import statspai as sp
>>> result = sp.randomize(df, strata='district', balance_vars=['age', 'income'])
>>> print(result.summary())
>>> df_randomized = result.data

balance_check

balance_check(data: DataFrame, treatment: str, covariates: List[str], alpha: float = 0.05) -> BalanceResult

Check covariate balance between treatment and control.

Computes normalized differences, t-tests, and omnibus F-test.

Equivalent to Stata's iebaltab and R's cobalt::bal.tab().

Parameters:

Name Type Description Default
data DataFrame
required
treatment str

Binary treatment variable (0/1).

required
covariates list of str

Covariates to check balance on.

required
alpha float
0.05

Returns:

Type Description
BalanceResult

Examples:

>>> import statspai as sp
>>> bal = sp.balance_check(df, treatment='treated', covariates=['age', 'income', 'education'])
>>> print(bal.summary())
>>> bal.plot()

attrition_test

attrition_test(data: DataFrame, treatment: str, observed: str, covariates: List[str] = None) -> AttritionResult

Test for differential attrition in an RCT.

Parameters:

Name Type Description Default
data DataFrame
required
treatment str

Treatment indicator (0/1).

required
observed str

Indicator for whether outcome is observed (1) or missing (0).

required
covariates list of str

Baseline covariates to test as predictors of attrition.

None

Returns:

Type Description
AttritionResult

Examples:

>>> import statspai as sp
>>> result = sp.attrition_test(df, treatment='treated', observed='endline_observed',
...                            covariates=['age', 'income', 'education'])
>>> print(result.summary())

attrition_bounds

attrition_bounds(data: DataFrame, y: str, treatment: str, observed: str = None, method: str = 'lee', alpha: float = 0.05) -> Dict[str, Any]

Compute bounds on treatment effects under attrition.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
treatment str

Treatment indicator (0/1).

required
observed str

Indicator for observed outcome. If None, uses non-missing y.

None
method str

Bounding method: 'lee' (Lee 2009), 'manski' (worst-case).

'lee'
alpha float
0.05

Returns:

Type Description
dict

Keys: 'lower_bound', 'upper_bound', 'naive_ate', 'method', 'n_obs'.

optimal_design

optimal_design(design: str = 'individual', sigma: float = 1.0, mde: float = None, power: float = 0.8, alpha: float = 0.05, n_arms: int = 2, prop_treat: float = 0.5, icc: float = 0.0, cluster_size: int = None, n_clusters: int = None, cost_per_cluster: float = None, cost_per_unit: float = None, r2: float = 0.0, baseline_mean: float = 0.0) -> OptimalDesignResult

Compute optimal sample size and design parameters.

Parameters:

Name Type Description Default
design str

'individual', 'cluster', 'stratified'.

'individual'
sigma float

Standard deviation of the outcome.

1.0
mde float

Minimum detectable effect. If None, compute MDE given n.

None
power float

Statistical power (1 - Type II error).

0.8
alpha float

Significance level.

0.05
n_arms int

Number of treatment arms.

2
prop_treat float

Proportion assigned to treatment.

0.5
icc float

Intra-cluster correlation (for cluster designs).

0.0
cluster_size int

Average cluster size.

None
n_clusters int

Number of clusters (if fixed).

None
cost_per_cluster float

Cost of adding a cluster (for optimal allocation).

None
cost_per_unit float

Cost per individual unit.

None
r2 float

R-squared from baseline covariates (variance reduction).

0.0
baseline_mean float
0.0

Returns:

Type Description
OptimalDesignResult

Examples:

>>> import statspai as sp
>>> result = sp.optimal_design(mde=0.2, sigma=1.0, icc=0.05, cluster_size=20)
>>> print(result.summary())