Skip to content

Decomposition Analysis

statspai.decomposition documents 18 decomposition methods across 13 modules (v0.9.2, ~6,200 LOC, 54 tests), covering mean, distributional, inequality, demographic, and causal decomposition. method-level validation metadata records which functions are backed by reference evidence versus API-stability commitments.

Unified dispatcher — sp.decompose

r = sp.decompose(
    method='ffl',                 # 30 aliases accepted
    data=df, y='log_wage', group='female',
    x=['education', 'experience'],
    stat='quantile', tau=0.5,
    inference='analytical',       # 'analytical' | 'bootstrap' | 'none'
)
r.summary(); r.plot(); r.to_latex()

Mean decomposition

Function Method / Paper
sp.oaxaca(df, ...) Blinder-Oaxaca threefold with 5 reference coefficients (Blinder 1973, Oaxaca 1973, Neumark 1988, Cotton 1988, Reimers 1983)
sp.gelbach(df, ...) Sequential orthogonal decomposition (Gelbach 2016, JoLE)
sp.fairlie(df, ...) Nonlinear logit/probit decomposition (Fairlie 1999, 2005)
sp.bauer_sinning(df, ...) / sp.yun_nonlinear(df, ...) Detailed nonlinear (Bauer-Sinning 2008; Yun 2004/05)

Distributional decomposition

Function Method / Paper
sp.rifreg(df, ...) / sp.rif_decomposition(...) RIF regression + OB (Firpo-Fortin-Lemieux 2009, Econometrica)
sp.ffl_decompose(df, ...) Two-step detailed (FFL 2018)
sp.dfl_decompose(df, ...) Reweighting counterfactuals (DiNardo-Fortin-Lemieux 1996)
sp.machado_mata(df, ...) Simulation-based QR decomposition (MM 2005)
sp.melly_decompose(df, ...) Analytical QR decomposition (Melly 2005)
sp.cfm_decompose(df, ...) Distribution regression (Chernozhukov-Fernández-Val-Melly 2013)

Inequality decomposition

Function Method / Paper
sp.subgroup_decompose(df, ...) Between/within for Theil T/L, GE(α), Dagum Gini (1997), Atkinson, CV² (Shorrocks 1984)
sp.shapley_inequality(df, ...) Shorrocks-Shapley allocation to covariates (Shorrocks 2013)
sp.source_decompose(df, ...) Gini source decomposition (Lerman-Yitzhaki 1985)

Demographic standardisation

Function Method / Paper
sp.kitagawa_decompose(df, ...) Two-factor rate decomposition (Kitagawa 1955)
sp.das_gupta(df_a, df_b, ...) Multi-factor symmetric (Das Gupta 1993)

Causal decomposition

Function Method / Paper
sp.gap_closing(df, method=...) Gap-closing estimator (Lundberg 2021), regression / IPW / AIPW
sp.mediation_decompose(df, ...) Natural direct/indirect effects (VanderWeele 2014)
sp.disparity_decompose(df, ...) Causal disparity decomposition (Jackson-VanderWeele 2018)

Quality bar

  • Closed-form influence functions for Theil T / Theil L / Atkinson (no O(n²) numerical fallback).
  • Weighted O(n log n) Dagum Gini via sorted-ECDF pairwise-MAD identity.
  • Cross-method consistency tests: test_dfl_ffl_mean_agree, test_mm_melly_cfm_aligned_reference, test_dfl_mm_reference_convention_opposite.
  • Numerical identity checks: FFL four-part sum, weighted Gini RIF \(E_w[\text{RIF}] = G\).