Skip to content

Decomposition methods in StatsPAI

sp.decomposition ships 19 estimators under a single dispatcher sp.decompose(method=...) — covering mean, distributional, inequality, demographic, and causal decomposition. This guide is a quick map of what's available, when to reach for each estimator, and what the v1.15 polish gives you (unified to_excel/to_word/cite/confint, publication-quality plots, and the Yu--Elwert (2025) frontier method).

Choosing an estimator

Question Method One-liner
Decompose mean gap into endowment + coefficient oaxaca Blinder (1973), Oaxaca (1973)
Sequential OVB attribution gelbach Gelbach (2016)
Logit / probit (binary outcome) fairlie, bauer_sinning Fairlie (2005), Bauer & Sinning (2008)
Quantile or Gini gap rif, ffl Firpo--Fortin--Lemieux (2009, 2018)
Reweight one group's X to match the other's dfl DiNardo, Fortin & Lemieux (1996)
Counterfactual quantile function machado_mata, melly MM (2005), Melly (2005)
Counterfactual distribution cfm Chernozhukov, Fernández-Val & Melly (2013)
Inequality between vs. within groups subgroup Shorrocks (1980), Cowell & Flachaire (2007)
Inequality contribution per regressor shapley_inequality Shorrocks (2013)
Inequality by income source gini_source Lerman & Yitzhaki (1985)
Aggregate rate gap (categorical) kitagawa, das_gupta Kitagawa (1955), Das Gupta (1993)
What gap would close under intervention gap_closing Lundberg (2022)
Direct vs. mediated mediation VanderWeele (2014)
Disparity due to mediator disparity (causal_jvw) Jackson & VanderWeele (2018)
Causal disparity → baseline + prevalence + effect + selection yu_elwert Yu & Elwert (2025)

When in doubt: a mean gap → oaxaca; a distributional gap → ffl; a causal gap → yu_elwert if you have a binary intermediate treatment, otherwise gap_closing for an aggregate intervention story.

Yu--Elwert (2025) — the frontier method

sp.yu_elwert_decompose(...) (alias sp.decompose("yu_elwert", ...) or sp.decompose("cdgd", ...)) decomposes a group disparity \(D = E[Y \mid R{=}1] - E[Y \mid R{=}0]\) operating through a binary treatment \(T\) into four mechanisms:

\[ D = \underbrace{(E[Y(0) \mid R{=}1] - E[Y(0) \mid R{=}0])}_{\text{baseline}} + \underbrace{E_0[\tau]\big(E[T \mid R{=}1] - E[T \mid R{=}0]\big)}_{\text{prevalence}} + \underbrace{E[T \mid R{=}1]\big(E_1[\tau] - E_0[\tau]\big)}_{\text{effect}} + \underbrace{\operatorname{Cov}_1(T, \tau) - \operatorname{Cov}_0(T, \tau)}_{\text{selection}} \]

Identification requires only conditional ignorability of \(T\) given \((R, X)\)not that \(R\) itself is unconfounded — which is what makes this approach distinct from causal-mediation methods. The "selection" piece is the novel mechanism: it captures whether the right people (those with the largest individual gain) end up treated within each group.

Two estimators are built in:

  • method="plugin" (default): within-cell OLS for \(m_{rt}(X)\) and within-group logit for \(p_r(X)\), then plug-in expectations. Fast, algebraic, and the residual disparity − Σ components is exactly zero by construction.
  • method="efficient": doubly-robust augmented moments with implicit cross-fitting. Recommended when the nuisance models are flexible / regularised; the small implied residual reflects the asymmetric augmentations rather than a bug.
import statspai as sp

r = sp.yu_elwert_decompose(
    data=df, y="y", treatment="t", group="r",
    x=["age", "educ", "exp"],
    method="plugin",
    n_boot=499,
)
r.summary()
fig, ax = r.plot()           # mechanism bar chart with 95% CI whiskers
r.to_word("yu_elwert.docx")  # full Word report

What the v1.15 polish ships

Every result object in sp.decomposition (Oaxaca, Gelbach, RIF, FFL, DFL, Machado--Mata, Melly, CFM, Fairlie, Bauer--Sinning, Yun, Kitagawa, Das Gupta, three inequality classes, GapClosing, Mediation, Disparity, Yu--Elwert) inherits DecompResultMixin, exposing the same surface:

Method What it does
result.summary() Pretty-printed text summary with significance stars and CIs.
result.plot(...) Method-specific publication plot (forest, waterfall, mechanism, quantile process, etc.).
result.confint(alpha=0.05) Normal-approx CIs from stored standard errors.
result.cite() Numbered bibliography of canonical references.
result.cite("bibtex_keys") The paper.bib keys.
result.to_dict() / result.to_json() JSON-serialisable snapshot.
result.to_latex() LaTeX tabular for the report.
result.to_excel("out.xlsx") Multi-sheet workbook (Overall, Detailed, …).
result.to_word("out.docx") One-shot Word report (requires python-docx).

Plotting

statspai.decomposition.plots exposes a small toolkit that every result class draws on:

  • detailed_waterfall(df, ...) — sign-coloured horizontal bars with optional 95% CI whiskers.
  • forest_plot(df, ...) — point estimates with CI lines; greys out rows whose CI crosses zero.
  • quantile_process_plot(result, show_ci=True) — gap, composition, structure as functions of \(\tau\) with shaded CI bands when SEs are available on the grid.
  • counterfactual_cdf_plot(result) — observed vs. counterfactual CDFs for CFM-style decompositions.
  • mediation_forest(result) — NDE / NIE / total with CIs.
  • yu_elwert_mechanisms_plot(result) — disparity, baseline, prevalence, effect, selection as a single bar chart.
  • rif_heatmap(grid_df) — variable × quantile contributions heatmap.

All plots share a single palette (DECOMP_PALETTE) and a despined minimal style (apply_decomp_style), so a panel of figures from different methods looks like part of the same report.

Inference

Every estimator that supports inference accepts:

  • Analytical (delta-method) standard errors when the formula is closed-form (Oaxaca, Gelbach, RIF, Yun, Bauer--Sinning, Kitagawa).
  • Bootstrap (cluster-aware, percentile / basic / normal CIs) when the formula is intractable (DFL, FFL, Machado--Mata, Melly, CFM, GapClosing, Mediation, Disparity, Yu--Elwert).
  • Wild bootstrap for residual-based statistics (statspai.decomposition._common.wild_bootstrap_stat) with Rademacher or Mammen weights and optional cluster IDs.

Reference parity

The plug-in Oaxaca, RIF, FFL, DFL, and Kitagawa estimators are numerically aligned with Ben Jann's Stata oaxaca, Fernando Rios-Avila's rif/ddecompose (Stata) and ddecompose (R), and the Stata kob package introduced in Jann's 2024 UK Stata Conference update. The Yu--Elwert estimator is aligned with the R cdgd package.

References

The bibliography backing every method (DOI-verified) lives in paper.bib; you can dump the keys for a result with result.cite("bibtex_keys") and pull the formatted strings with result.cite("list").

Headline recent papers behind the v1.15 polish:

  • Yu, A. & Elwert, F. (2025). Nonparametric causal decomposition of group disparities. Annals of Applied Statistics, 19(1), 821–845. doi:10.1214/24-AOAS1990.
  • Oaxaca, R. L. & Sierminska, E. (2025). Oaxaca-Blinder meets Kitagawa: What is the link? PLOS ONE, 20(5), e0321874. doi:10.1371/journal.pone.0321874.
  • Park, S., Kang, S., & Lee, C. (2024). Choosing an Optimal Method for Causal Decomposition Analysis with Continuous Outcomes. Sociological Methodology, 54(1), 92–117. doi:10.1177/00811750231183711.
  • Ahrens, A., Hansen, C. B., Schaffer, M. E., & Wiemann, T. (2025). Model averaging and double machine learning. Journal of Applied Econometrics, 40(3), 249–269. doi:10.1002/jae.3103.
  • Kröger, H. & Hartmann, J. (2021). Extending the Kitagawa-Oaxaca-Blinder decomposition approach to panel data. Stata Journal, 21(2), 360–410. doi:10.1177/1536867X211025800.
  • Rios-Avila, F. (2020). Recentered influence functions (RIFs) in Stata. Stata Journal, 20(1), 51–94. doi:10.1177/1536867X20909690.

For the full canonical list, run result.cite() on any result object.