Decomposition methods in StatsPAI¶
sp.decomposition ships 19 estimators under a single dispatcher
sp.decompose(method=...) — covering mean, distributional, inequality,
demographic, and causal decomposition. This guide is a quick map of
what's available, when to reach for each estimator, and what the v1.15
polish gives you (unified to_excel/to_word/cite/confint,
publication-quality plots, and the Yu--Elwert (2025) frontier method).
Choosing an estimator¶
| Question | Method | One-liner |
|---|---|---|
| Decompose mean gap into endowment + coefficient | oaxaca |
Blinder (1973), Oaxaca (1973) |
| Sequential OVB attribution | gelbach |
Gelbach (2016) |
| Logit / probit (binary outcome) | fairlie, bauer_sinning |
Fairlie (2005), Bauer & Sinning (2008) |
| Quantile or Gini gap | rif, ffl |
Firpo--Fortin--Lemieux (2009, 2018) |
| Reweight one group's X to match the other's | dfl |
DiNardo, Fortin & Lemieux (1996) |
| Counterfactual quantile function | machado_mata, melly |
MM (2005), Melly (2005) |
| Counterfactual distribution | cfm |
Chernozhukov, Fernández-Val & Melly (2013) |
| Inequality between vs. within groups | subgroup |
Shorrocks (1980), Cowell & Flachaire (2007) |
| Inequality contribution per regressor | shapley_inequality |
Shorrocks (2013) |
| Inequality by income source | gini_source |
Lerman & Yitzhaki (1985) |
| Aggregate rate gap (categorical) | kitagawa, das_gupta |
Kitagawa (1955), Das Gupta (1993) |
| What gap would close under intervention | gap_closing |
Lundberg (2022) |
| Direct vs. mediated | mediation |
VanderWeele (2014) |
| Disparity due to mediator | disparity (causal_jvw) |
Jackson & VanderWeele (2018) |
| Causal disparity → baseline + prevalence + effect + selection | yu_elwert |
Yu & Elwert (2025) |
When in doubt: a mean gap → oaxaca; a distributional gap → ffl;
a causal gap → yu_elwert if you have a binary intermediate
treatment, otherwise gap_closing for an aggregate intervention story.
Yu--Elwert (2025) — the frontier method¶
sp.yu_elwert_decompose(...) (alias sp.decompose("yu_elwert", ...) or
sp.decompose("cdgd", ...)) decomposes a group disparity
\(D = E[Y \mid R{=}1] - E[Y \mid R{=}0]\)
operating through a binary treatment \(T\) into four mechanisms:
Identification requires only conditional ignorability of \(T\) given \((R, X)\) — not that \(R\) itself is unconfounded — which is what makes this approach distinct from causal-mediation methods. The "selection" piece is the novel mechanism: it captures whether the right people (those with the largest individual gain) end up treated within each group.
Two estimators are built in:
method="plugin"(default): within-cell OLS for \(m_{rt}(X)\) and within-group logit for \(p_r(X)\), then plug-in expectations. Fast, algebraic, and the residualdisparity − Σ componentsis exactly zero by construction.method="efficient": doubly-robust augmented moments with implicit cross-fitting. Recommended when the nuisance models are flexible / regularised; the small implied residual reflects the asymmetric augmentations rather than a bug.
import statspai as sp
r = sp.yu_elwert_decompose(
data=df, y="y", treatment="t", group="r",
x=["age", "educ", "exp"],
method="plugin",
n_boot=499,
)
r.summary()
fig, ax = r.plot() # mechanism bar chart with 95% CI whiskers
r.to_word("yu_elwert.docx") # full Word report
What the v1.15 polish ships¶
Every result object in sp.decomposition (Oaxaca, Gelbach, RIF, FFL,
DFL, Machado--Mata, Melly, CFM, Fairlie, Bauer--Sinning, Yun, Kitagawa,
Das Gupta, three inequality classes, GapClosing, Mediation, Disparity,
Yu--Elwert) inherits DecompResultMixin, exposing the same surface:
| Method | What it does |
|---|---|
result.summary() |
Pretty-printed text summary with significance stars and CIs. |
result.plot(...) |
Method-specific publication plot (forest, waterfall, mechanism, quantile process, etc.). |
result.confint(alpha=0.05) |
Normal-approx CIs from stored standard errors. |
result.cite() |
Numbered bibliography of canonical references. |
result.cite("bibtex_keys") |
The paper.bib keys. |
result.to_dict() / result.to_json() |
JSON-serialisable snapshot. |
result.to_latex() |
LaTeX tabular for the report. |
result.to_excel("out.xlsx") |
Multi-sheet workbook (Overall, Detailed, …). |
result.to_word("out.docx") |
One-shot Word report (requires python-docx). |
Plotting¶
statspai.decomposition.plots exposes a small toolkit that every
result class draws on:
detailed_waterfall(df, ...)— sign-coloured horizontal bars with optional 95% CI whiskers.forest_plot(df, ...)— point estimates with CI lines; greys out rows whose CI crosses zero.quantile_process_plot(result, show_ci=True)— gap, composition, structure as functions of \(\tau\) with shaded CI bands when SEs are available on the grid.counterfactual_cdf_plot(result)— observed vs. counterfactual CDFs for CFM-style decompositions.mediation_forest(result)— NDE / NIE / total with CIs.yu_elwert_mechanisms_plot(result)— disparity, baseline, prevalence, effect, selection as a single bar chart.rif_heatmap(grid_df)— variable × quantile contributions heatmap.
All plots share a single palette (DECOMP_PALETTE) and a despined
minimal style (apply_decomp_style), so a panel of figures from
different methods looks like part of the same report.
Inference¶
Every estimator that supports inference accepts:
- Analytical (delta-method) standard errors when the formula is closed-form (Oaxaca, Gelbach, RIF, Yun, Bauer--Sinning, Kitagawa).
- Bootstrap (cluster-aware, percentile / basic / normal CIs) when the formula is intractable (DFL, FFL, Machado--Mata, Melly, CFM, GapClosing, Mediation, Disparity, Yu--Elwert).
- Wild bootstrap for residual-based statistics
(
statspai.decomposition._common.wild_bootstrap_stat) with Rademacher or Mammen weights and optional cluster IDs.
Reference parity¶
The plug-in Oaxaca, RIF, FFL, DFL, and Kitagawa estimators are
numerically aligned with Ben Jann's Stata oaxaca, Fernando
Rios-Avila's rif/ddecompose (Stata) and ddecompose (R), and the
Stata kob package introduced in Jann's 2024 UK Stata Conference
update. The Yu--Elwert estimator is aligned with the R cdgd package.
References¶
The bibliography backing every method (DOI-verified) lives in
paper.bib; you can dump the keys for a result with
result.cite("bibtex_keys") and pull the formatted strings with
result.cite("list").
Headline recent papers behind the v1.15 polish:
- Yu, A. & Elwert, F. (2025). Nonparametric causal decomposition of group disparities. Annals of Applied Statistics, 19(1), 821–845. doi:10.1214/24-AOAS1990.
- Oaxaca, R. L. & Sierminska, E. (2025). Oaxaca-Blinder meets Kitagawa: What is the link? PLOS ONE, 20(5), e0321874. doi:10.1371/journal.pone.0321874.
- Park, S., Kang, S., & Lee, C. (2024). Choosing an Optimal Method for Causal Decomposition Analysis with Continuous Outcomes. Sociological Methodology, 54(1), 92–117. doi:10.1177/00811750231183711.
- Ahrens, A., Hansen, C. B., Schaffer, M. E., & Wiemann, T. (2025). Model averaging and double machine learning. Journal of Applied Econometrics, 40(3), 249–269. doi:10.1002/jae.3103.
- Kröger, H. & Hartmann, J. (2021). Extending the Kitagawa-Oaxaca-Blinder decomposition approach to panel data. Stata Journal, 21(2), 360–410. doi:10.1177/1536867X211025800.
- Rios-Avila, F. (2020). Recentered influence functions (RIFs) in Stata. Stata Journal, 20(1), 51–94. doi:10.1177/1536867X20909690.
For the full canonical list, run result.cite() on any result object.