Skip to content

Choosing a matching / weighting estimator

When your design relies on selection-on-observables (CIA / unconfoundedness) and you have a binary treatment, you have 7+ estimators in StatsPAI. Here's how to choose.

0. TL;DR flowchart

Is your covariate set high-dimensional (p > 20)?
  YES -> Double ML (sp.dml), meta-learners (sp.S_Learner, etc.)
  NO  -> continue

Is your target the ATT (effect on the treated)?
  YES -> sp.ebalance (entropy balancing) or sp.match(estimand='ATT')
  NO  -> continue

Is your target the ATE (population average)?
  YES -> sp.cbps(estimand='ATE') or sp.aipw
  NO  -> continue

Do you need OVERLAP-weighted effect (avoiding extrapolation)?
  YES -> sp.overlap_weights (ATO)
  NO  -> rethink — what estimand do you actually want?

1. Entropy balancing (ebal) — the "just works" default for ATT

Hainmueller (2012). Exact covariate balance by reweighting, no propensity-score modelling needed.

r = sp.ebalance(df, y='y', treat='d',
                covariates=['X1', 'X2', 'X3'],
                moments=1)  # balance means; moments=2 adds variances

Pros: no PSM model specification; exact balance by construction; no King-Nielsen issue. Cons: targets ATT only; can be sensitive to extreme weights.

2. Nearest-neighbor matching

Beware: King & Nielsen (2019) show that PSM-based nearest-neighbor matching can increase imbalance. Prefer Mahalanobis or coarsened exact matching (CEM):

r = sp.match(df, y='y', treat='d', covariates=[...],
             distance='mahalanobis',  # NOT 'propensity'
             method='nearest', n_matches=3)

3. Covariate Balancing Propensity Score (CBPS)

Imai-Ratkovic (2014). Fits the propensity score to balance covariates directly, not to maximise likelihood.

r = sp.cbps(df, y='y', treat='d', covariates=[...],
            estimand='ATE',  # or 'ATT'
            variant='over')   # 'over' (overidentified) is preferred

More robust to PS misspecification than IPW.

4. Overlap weights (ATO)

Li-Morgan-Zaslavsky (2018). Weights each unit by its propensity of receiving the "other" treatment, yielding effects on the overlap population — the subpopulation where both treatments are plausible.

r = sp.overlap_weights(df, y='y', treat='d', covariates=[...],
                       estimand='ATO')

Avoids extreme weights from near-zero / near-one propensities.

5. Doubly-robust estimators

AIPW combines an outcome model and a propensity-score model — correct if either is right.

r = sp.aipw(df, y='y', treat='d', covariates=[...])

For high-dimensional covariates, use Double ML (Chernozhukov et al. 2018):

r = sp.dml(df, y='y', treat='d', covariates=[...],
           ml_model='lasso',         # or 'rf', 'xgb'
           cross_fitting_folds=5)

DML is the state of the art for observational ATE with many controls.

6. Meta-learners (for heterogeneous effects)

If you want not just the ATE but a CATE function τ(X):

from statspai.metalearners import S_Learner, T_Learner, X_Learner, DR_Learner

dr = DR_Learner(outcome_model='rf', ps_model='lr')
dr.fit(df[cov_cols], df['d'], df['y'])
cate = dr.predict(df_new[cov_cols])

See the meta-learner guide for diagnostics (CATE calibration, policy value).

7. Common mistakes

Mistake Fix
Including post-treatment variables in covariates Drop them — never condition on consequences
Including colliders as covariates Use a DAG (sp.DAG) to check adjustment sets
Reporting results without checking overlap Always plot PS distributions (sp.psplot)
Reporting ATE when you computed ATT Check estimand in the call / result
Using PSM nearest-neighbor (King-Nielsen 2019) Use distance='mahalanobis' or method='cem'
Not trimming extreme weights Use trim=0.01 or overlap weights

8. Mandatory diagnostics

r = sp.ebalance(df, y='y', treat='d', covariates=[...])

# 1. Balance before/after
sp.love_plot(r)      # SMDs before and after weighting
sp.ps_balance(r)     # formal balance statistics

# 2. Overlap / common support
sp.overlap_plot(r)
sp.trimming(r, threshold=0.01)

# 3. Sensitivity to unobserved confounding
sp.sensemakr(r, benchmark_covariates=['X1'])  # Cinelli-Hazlett
sp.oster_bounds(r)                             # Oster 2019
sp.evalue(r)                                   # VanderWeele-Ding E-value

9. Reading the output

r.estimate           # Point estimate (ATT / ATE / ATO)
r.se                 # Bootstrap or analytical SE
r.ci                 # CI
r.tidy()             # Main row + per-unit weights if detail available
r.glance()           # method, nobs, estimand, ESS (effective sample size)
r.detail             # If present: balance table with SMDs

10. Estimand cheat sheet

Estimand What it is Recommended estimator
ATT Average effect on the treated ebalance, match(ATT)
ATE Average effect on the population cbps(ATE), aipw, dml
ATO Effect on the overlap population overlap_weights
ATC Average effect on the controls match(estimand='ATC')
CATE(x) Conditional on covariates X=x Meta-learners, causal forest
LATE Effect on compliers IV (not matching)

For Agents

Pre-conditions - binary treatment 0/1 - covariates are pre-treatment (temporally prior to D) - enough control units for each treated unit under the chosen method (k:1 matching) - covariates numeric; categoricals one-hot or handled by caliper/mahalanobis

Identifying assumptions - Unconfoundedness / CIA: Y(d) ⊥ D | X - Overlap / common support: treated X-values are in the control X-support - SUTVA: no interference between matched units - Covariates are selected before looking at outcomes (no post-treatment conditioning)

Failure modes → recovery

Symptom Exception Remedy Try next
Covariate imbalance after matching (max |SMD| > 0.1) statspai.AssumptionViolation Re-match with stricter caliper, add interactions, or switch to sp.ebalance (entropy balancing). sp.ebalance
Poor propensity score overlap (density plots, treated mass where controls are sparse) statspai.AssumptionViolation Apply sp.trimming (Crump 2009) or redefine the estimand to the overlap region. sp.trimming
Too few matched controls per treated unit statspai.DataInsufficient Relax caliper, allow with-replacement, or use entropy balancing / overlap weights. sp.ebalance
Results highly sensitive to match specification statspai.AssumptionWarning Report sp.rosenbaum_bounds (sensitivity to unobserved confounding) and compare multiple matching methods. sp.rosenbaum_bounds

Alternatives (ranked) - sp.ebalance - sp.cbps - sp.optimal_match - sp.sbw - sp.ipw

Typical minimum N: 200