Skip to content

statspai.decomposition

decomposition

Decomposition Analysis module for StatsPAI.

Decomposition toolkit covering mean, distributional, inequality, demographic, and causal decomposition methods under a unified API: sp.decompose(method=...).

Methods (19 in total — Yu-Elwert added in v1.15)

Mean decomposition - oaxaca — Blinder-Oaxaca (Blinder 1973; Oaxaca 1973) with 5 reference-coefficient choices (A, B, pooled/Neumark, Cotton, Reimers) - gelbach — Gelbach (2016) sequential orthogonal decomposition of omitted-variable bias - fairlie — Fairlie (2005) nonlinear decomposition for logit/probit - bauer_sinning / yun_nonlinear — Bauer-Sinning (2008) + Yun (2005) detailed nonlinear decomposition

Distributional decomposition - rif — Recentered Influence Function regression + OB decomposition (Firpo-Fortin-Lemieux 2009) - ffl — Firpo-Fortin-Lemieux (2018) two-step detailed decomposition - dfl — DiNardo-Fortin-Lemieux (1996) reweighting - machado_mata — Machado-Mata (2005) quantile decomposition - melly — Melly (2005) analytical quantile decomposition - cfm — Chernozhukov-Fernández-Val-Melly (2013) counterfactual distributions via distribution regression

Inequality decomposition - subgroup — between/within decomposition (Theil T/L, GE, Gini, Atkinson, CV²) - shapley_inequality — Shorrocks-Shapley (2013) allocation of inequality to covariates - gini_source — Lerman-Yitzhaki (1985) Gini source decomposition

Demographic / standardisation - kitagawa — Kitagawa (1955) two-factor rate decomposition - das_gupta — Das Gupta (1993) multi-factor decomposition

Causal decomposition - gap_closing — Lundberg (2022) gap-closing estimator (regression / IPW / AIPW) - mediation — VanderWeele (2014) natural direct/indirect effects - disparity / causal_jvw — Jackson-VanderWeele (2018) causal disparity decomposition - yu_elwert — Yu & Elwert (2025) nonparametric causal decomposition of group disparities into baseline, prevalence, effect, and selection components (efficient-influence-function-based; ML-friendly)

Unified Entry

sp.decompose(method=..., **kwargs) dispatches to any of the above.

Polish (v1.15)

Every result class now inherits DecompResultMixin, exposing a common .confint(), .cite(), .to_dict(), .to_json(), .to_excel(), and .to_word() surface in addition to each method's bespoke .summary() / .plot() / .to_latex(). Plots share a common palette and minimalist style via :mod:statspai.decomposition.plots (forest plots, mediation forest, Yu-Elwert mechanism plot, RIF heatmap, …).

OaxacaResult

Bases: DecompResultMixin

Result container for Oaxaca-Blinder decomposition.

Attributes:

Name Type Description
overall dict

Keys: 'gap', 'explained', 'unexplained', 'explained_se', 'unexplained_se', 'unexplained_a', 'unexplained_b' (threefold components).

detailed DataFrame

Variable-level decomposition with columns contribution, se, pct_of_explained.

group_stats dict

Per-group means, coefficients, standard errors, sample sizes.

reference str or int

Reference weight specification used.

summary

summary() -> str

Return formatted decomposition summary.

plot

plot(figsize=(8, 5), kind: str = 'waterfall', **kwargs)

Bar / forest chart of per-variable explained contributions.

Parameters:

Name Type Description Default
kind ('waterfall', 'forest')

"waterfall" (default) is a sign-coloured bar chart with optional 95% CI whiskers; "forest" shows point estimates with CI lines and greys out non-significant rows.

"waterfall"

to_latex

to_latex() -> str

Return a LaTeX-formatted decomposition table.

GelbachResult

Bases: DecompResultMixin

Result container for Gelbach (2016) decomposition.

Attributes:

Name Type Description
total_change float

Total change in the base coefficient when added controls are included: beta_base - beta_full.

decomposition DataFrame

Per-variable contributions with columns delta, se, pct_of_change.

base_coef float

Coefficient of interest from the base (short) regression.

full_coef float

Coefficient of interest from the full (long) regression.

base_var str

Name of the variable of interest.

summary

summary() -> str

Return formatted Gelbach decomposition summary.

plot

plot(figsize=(8, 5), color='#4CAF50')

Horizontal bar chart of Gelbach contributions.

Returns:

Type Description
(fig, ax)

to_latex

to_latex() -> str

Return LaTeX table of the decomposition.

DFLResult dataclass

Bases: DecompResultMixin

Container for DFL reweighting decomposition results.

plot

plot(**kwargs)

Delegate to plots.dfl_plot().

FFLResult dataclass

Bases: DecompResultMixin

Container for Firpo-Fortin-Lemieux two-step decomposition.

MachadoMataResult dataclass

Bases: DecompResultMixin

Container for Machado-Mata decomposition.

MellyResult dataclass

Bases: DecompResultMixin

Container for Melly quantile decomposition.

CFMResult dataclass

Bases: DecompResultMixin

Chernozhukov-Fernández-Val-Melly counterfactual distribution result.

YuElwertResult dataclass

Bases: DecompResultMixin

Yu-Elwert (2025) causal decomposition of a group disparity.

Attributes:

Name Type Description
disparity float

Observed gap E[Y|R=1] - E[Y|R=0].

baseline float

Counterfactual disparity if no one were treated.

prevalence float

Contribution of differential treatment uptake (group A vs. B), scaled by the reference group's average treatment effect.

effect float

Contribution of group heterogeneity in average treatment effects, scaled by the advantaged group's treatment prevalence.

selection float

Group-specific covariance between treatment assignment and individual-level effect heterogeneity — the signature mechanism of Yu-Elwert.

se dict[str, float] | None

Standard errors keyed by component (disparity, baseline, prevalence, effect, selection).

ci dict[str, (float, float)] | None

Two-sided 95% confidence intervals (matching alpha argument).

detailed DataFrame

Tidy table of (component, value, se, ci_low, ci_high).

nuisance dict

Diagnostic snapshot — group sizes, fitted per-cell means, plus fallback_cell_count and bootstrap_failure_count when applicable so the user can audit a degenerate run.

method str

"plugin" or "efficient".

gelbach

gelbach(data: DataFrame, y: str, base_x: Sequence[str], added_x: Sequence[str], var_of_interest: Optional[str] = None, alpha: float = 0.05) -> GelbachResult

Gelbach (2016) decomposition of omitted variable bias.

When controls are added to a regression, the coefficient on a variable of interest may change. This function decomposes that change into contributions from each added variable, answering: "Which added controls explain the change, and by how much?"

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
base_x list of str

Variables in the base (short) specification.

required
added_x list of str

Variables added to obtain the full (long) specification.

required
var_of_interest str

Which base variable's coefficient change to decompose. Defaults to the first element of base_x.

None
alpha float

Significance level.

0.05

Returns:

Type Description
GelbachResult

Result object with .summary(), .plot(), .to_latex().

Notes

The Gelbach identity:

.. math::

\hat\beta^{\text{base}}_k - \hat\beta^{\text{full}}_k
= \sum_{j \in \text{added}} \tilde\gamma_{kj} \hat\beta^{\text{full}}_j

where :math:\tilde\gamma_{kj} is the coefficient from regressing added variable j on all base variables (including constant).

Examples:

>>> import statspai as sp
>>> result = sp.gelbach(
...     data=df, y="wage",
...     base_x=["education"],
...     added_x=["experience", "tenure", "union"],
... )
>>> result.summary()

rifreg

rifreg(formula: str, data: DataFrame, statistic: StatisticKind = 'quantile', tau: float = 0.5) -> RIFResult

RIF regression (Firpo, Fortin & Lemieux 2009).

Parameters:

Name Type Description Default
formula str

"y ~ x1 + x2" style.

required
data DataFrame
required
statistic ('quantile', 'variance', 'gini')
"quantile"
tau float

Quantile level (default 0.5 = median UQPE).

0.5

rif_decomposition

rif_decomposition(formula: str, data: DataFrame, group: str, statistic: StatisticKind = 'quantile', tau: float = 0.5, reference: int = 0) -> RIFDecompositionResult

RIF Oaxaca-Blinder decomposition (FFL 2009, Section 5).

Decomposes the between-group difference in a distributional statistic into explained (covariate endowment) and unexplained (coefficient) parts at the chosen statistic.

Parameters:

Name Type Description Default
group str

Binary (0/1) group indicator column.

required
reference int

Which group's coefficients to use as the reference (0 or 1).

0

rif_values

rif_values(y: ndarray, statistic: StatisticKind = 'quantile', tau: float = 0.5) -> ndarray

Compute the RIF of each observation.

Delegates to :func:statspai.decomposition._common.influence_function, which is the canonical implementation shared with the FFL two-step decomposition. Supported statistics (expanded in this release):

``quantile`` (with ``tau``), ``mean``, ``variance``, ``std``,
``log_var``, ``iqr``, ``gini``, ``theil_t``, ``theil_l``,
``atkinson`` (ε = 1).

Parameters:

Name Type Description Default
y (n,) array
required
statistic str
'quantile'
tau float

Quantile level (only used when statistic="quantile").

0.5

dfl_decompose

dfl_decompose(data: DataFrame, y: str, group: str, x: Sequence[str], stat: str = 'mean', tau: float = 0.5, reference: int = 0, weights: Optional[Union[str, ndarray]] = None, trim: float = 0.001, inference: str = 'analytical', n_boot: int = 299, alpha: float = 0.05, quantile_grid: Optional[Sequence[float]] = None, seed: Optional[int] = 12345) -> DFLResult

DFL (1996) reweighting decomposition at a chosen distributional statistic.

Parameters:

Name Type Description Default
data DataFrame
required
y str — outcome variable name
required
group str — binary (0/1) group indicator
required
x Sequence[str] — covariates used for propensity model
required
stat ('mean', 'variance', 'std', 'quantile', 'iqr', 'gini', 'log_var')
'mean'
tau float — quantile level (when stat='quantile')
0.5
reference (0, 1)
  • 0: reweight Group B to look like A's X (default). The counterfactual is F_{Y<1|0>} — A's X distribution with B's outcome structure.
  • 1: reweight Group A to look like B's X. The counterfactual is F_{Y<0|1>} — B's X distribution with A's outcome structure.

.. warning:: reference has different economic semantics across method families. In DFL, reference=0 yields cf = A's X, B's β (reweighting approach). In machado_mata / melly / cfm, reference=0 yields cf = A's β, B's X (coefficient-substitution approach). These are opposite counterfactual constructions. Within each method labels are internally consistent (DFL structure = A − cf; MM composition = A − cf). When comparing estimates across methods, read the per-method docstrings carefully.

0
weights str, array or None — sample weights
None
trim float — clip propensity scores to [trim, 1-trim]
0.001
inference ('none', 'bootstrap', 'analytical')
'none'
n_boot int — bootstrap replications
299
alpha float — CI level
0.05
quantile_grid sequence of τ ∈ (0, 1) or None

If provided, also compute quantile-process decomposition on this grid.

None
seed int or None
12345

Returns:

Type Description
DFLResult

ffl_decompose

ffl_decompose(data: DataFrame, y: str, group: str, x: Sequence[str], stat: str = 'quantile', tau: float = 0.5, reference: int = 0, weights: Optional[Union[str, ndarray]] = None, trim: float = 0.001, inference: str = 'analytical', n_boot: int = 299, alpha: float = 0.05, seed: Optional[int] = 12345) -> FFLResult

Firpo-Fortin-Lemieux two-step detailed distributional decomposition.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
group str — binary {0, 1}
required
x Sequence[str]
required
stat {'quantile', 'mean', 'variance', 'std', 'iqr', 'gini',
'log_var', 'theil_t', 'theil_l', 'atkinson'}
'quantile'
tau float (for quantile)
0.5
reference int {0, 1}

0: B reweighted to look like A's X (composition = effect of A's X on B's outcomes relative to observed B)

0
weights (str, array or None)
None
trim float — propensity trim
0.001
inference ('analytical', 'bootstrap', 'none')
'analytical'
n_boot int
299
alpha float
0.05
seed int or None
12345

melly_decompose

melly_decompose(data: DataFrame, y: str, group: str, x: Sequence[str], tau_grid: Optional[Sequence[float]] = None, reference: int = 0, n_tau_qr: int = 99) -> MellyResult

Melly (2005) quantile decomposition.

Parameters:

Name Type Description Default
data DataFrame
required
y column names
required
group column names
required
x column names
required
tau_grid Sequence[float] or None — reporting τ grid
None
reference (0, 1)

Same convention as machado_mata: reference=0 uses A's β on B's X (coefficient-swap counterfactual F_{Y<0|1>}), opposite to dfl_decompose whose reference=0 uses A's X with B's β.

0
n_tau_qr int — QR estimation grid resolution
99

Returns:

Type Description
MellyResult

cfm_decompose

cfm_decompose(data: DataFrame, y: str, group: str, x: Sequence[str], tau_grid: Optional[Sequence[float]] = None, reference: int = 0, n_thresh: int = 40, ks_test: bool = True) -> CFMResult

Chernozhukov-Fernández-Val-Melly (2013) counterfactual decomposition.

Parameters:

Name Type Description Default
data DataFrame
required
y column names
required
group column names
required
x column names
required
tau_grid Sequence[float] or None
None
reference (0, 1)

Same convention as machado_mata / melly_decompose: reference=0 builds the counterfactual from A's distribution regression coefficients applied to B's X (F_{Y<0|1>}), opposite to the reweighting convention in dfl_decompose.

0
n_thresh int — number of thresholds for distribution regression
40
ks_test bool — whether to compute Kolmogorov-Smirnov gap test
True

Returns:

Type Description
CFMResult

fairlie

fairlie(data: DataFrame, y: str, group: str, x: Sequence[str], model: str = 'logit', reference: int = 0, n_sim: int = 500, seed: Optional[int] = 12345) -> NonlinearDecompResult

Fairlie (2005) nonlinear decomposition for binary outcomes.

Procedure: fit model on reference group; rank-match one group onto the other; compute mean predicted probability under counterfactual X; variable-level contribution = change in mean prediction when that variable is swapped to the other group's value.

Parameters:

Name Type Description Default
data DataFrame
required
y str — binary {0, 1}
required
group str — binary
required
x Sequence[str]
required
model ('logit', 'probit')
'logit'
reference (0, 1)
0
n_sim int — number of random matchings to average over
500
seed int or None
12345

bauer_sinning

bauer_sinning(data: DataFrame, y: str, group: str, x: Sequence[str], model: str = 'logit', reference: int = 0, variant: str = 'yun') -> NonlinearDecompResult

Bauer-Sinning (2008) nonlinear Oaxaca-Blinder decomposition with Yun (2004, 2005) weights for detailed contributions.

Implements the three-fold equivalent: gap = [E(p_a(X_a)) - E(p_r(X_a))] # not used here but uses Yun's weight decomposition: explained_j = w_j · (E(p_r(X_a)) - E(p_r(X_b))) where w_j = (Δx̄_j · β_r_j) / Σ_k (Δx̄_k · β_r_k)

Parameters:

Name Type Description Default
model ('logit', 'probit')
'logit'
reference (0, 1)
0
variant 'yun'
'yun'

inequality_index

inequality_index(y: ndarray, index: str = 'theil_t', weights: Optional[ndarray] = None, eps: float = 1.0, alpha: Optional[float] = None) -> float

Compute a single inequality index.

subgroup_decompose

subgroup_decompose(data: DataFrame, y: str, by: str, index: str = 'theil_t', weights: Optional[Union[str, ndarray]] = None, eps: float = 1.0, alpha: Optional[float] = None) -> SubgroupDecompResult

Subgroup decomposition (between / within) of an inequality index.

Supported for additive GE family (theil_t, theil_l, mld, ge0, ge1, ge2, cv2, atkinson(ε=1)). Gini returns Dagum (1997) Gini_B / Gini_W / Gini_overlap.

Parameters:

Name Type Description Default
data DataFrame
required
y str — outcome
required
by str — grouping variable
required
index str — inequality index name
'theil_t'
weights (str, array or None)
None
eps float — Atkinson parameter
1.0
alpha float or None — GE parameter override
None

source_decompose

source_decompose(data: DataFrame, sources: Sequence[str], weights: Optional[Union[str, ndarray]] = None) -> SourceDecompResult

Lerman-Yitzhaki (1985) Gini source decomposition.

Total income = Σ sources. Each source's contribution is S_k · R_k · G_k / G_total where S_k is its share of total mean, R_k the Gini correlation with total rank, G_k its own Gini.

shapley_inequality

shapley_inequality(data: DataFrame, y: str, x: Sequence[str], index: str = 'theil_t', weights: Optional[Union[str, ndarray]] = None) -> ShapleyInequalityResult

Shorrocks-Shapley decomposition of an inequality index across covariates.

For each subset S ⊆ covariates, compute the predicted outcome ŷ_S = X_S · β_S (OLS) and evaluate index I(ŷ_S). The marginal contribution of variable j to I is averaged over all orderings yielding its Shapley value φ_j.

Parameters:

Name Type Description Default
data as usual
required
y as usual
required
x as usual
required
index str
'theil_t'
weights (str, array or None)
None
Notes

Combinatorial cost: O(2^|x|). For |x| ≤ 10 this is fine; for larger x the function warns and uses a random permutation sampler.

kitagawa_decompose

kitagawa_decompose(data: DataFrame, rate: str, group: str, by: Union[str, Sequence[str]], weights: Optional[str] = None, normalize: str = 'symmetric') -> KitagawaResult

Kitagawa (1955) two-factor rate decomposition.

Parameters:

Name Type Description Default
data DataFrame

Tidy data. Either individual-level (aggregated internally) or pre-aggregated cell-level with columns: group, by, rate, optional weights (population size in each cell).

required
rate str

Column holding the category-specific rate (or 0/1 outcome at the individual level).

required
group str

Binary group indicator.

required
by str or list of str

Category variable(s) defining cells.

required
weights str or None

Cell population weights. If None, each row treated as individual-level data (weight = 1).

None
normalize ('symmetric', 'a', 'b')
  • 'a': rate effect evaluated at A's composition
  • 'b': rate effect evaluated at B's composition
  • 'symmetric': average (default)
'symmetric'

das_gupta

das_gupta(data_a: DataFrame, data_b: DataFrame, factor_names: Sequence[str]) -> DasGuptaResult

Das Gupta (1993) multi-factor decomposition.

Decomposes the difference in a product-form aggregate into each factor's contribution using symmetric averaging across all possible orderings.

Parameters:

Name Type Description Default
data_a pd.DataFrame with the same factor columns.

Each row contributes the factor value. The aggregate for each group is computed as Σ_i ∏f factor{f,i}.

For single-row DataFrames (one population, no stratification) the aggregate is simply ∏_f factor_f.

required
data_b pd.DataFrame with the same factor columns.

Each row contributes the factor value. The aggregate for each group is computed as Σ_i ∏f factor{f,i}.

For single-row DataFrames (one population, no stratification) the aggregate is simply ∏_f factor_f.

required
factor_names list of factor column names.
required
Notes

Assumes: rate = f_1 * f_2 * ... * f_m (aggregate product form). For additive forms use kitagawa_decompose.

gap_closing

gap_closing(data: DataFrame, y: str, group: str, x: Sequence[str], method: str = 'aipw', target_dist: int = 1, trim: float = 0.001, inference: str = 'analytical', n_boot: int = 299, alpha: float = 0.05, seed: Optional[int] = 12345) -> GapClosingResult

Lundberg (2021) gap-closing estimator.

Computes the counterfactual mean gap that would remain if one group's covariate distribution were shifted to match the other's.

Parameters:

Name Type Description Default
data DataFrame
required
y column names
required
group column names
required
x column names
required
method ('regression', 'ipw', 'aipw')

AIPW is doubly robust (recommended).

'regression'
target_dist (0, 1)
  • 1: shift Group A's covariate distribution to match Group B's
  • 0: shift Group B's to match Group A's
0
trim float — propensity trim
0.001
inference ('analytical', 'bootstrap', 'none')
'analytical'

mediation_decompose

mediation_decompose(data: DataFrame, y: str, treatment: str, mediator: str, covariates: Optional[Sequence[str]] = None, inference: str = 'analytical', n_boot: int = 299, alpha: float = 0.05, seed: Optional[int] = 12345) -> MediationDecompResult

Linear nested-models mediation decomposition (VanderWeele 2014 four-way simplified to natural direct / indirect under linearity).

Parameters:

Name Type Description Default
data DataFrame
required
y str — continuous outcome
required
treatment str — binary exposure
required
mediator str — mediator
required
covariates list of str or None
None
inference ('analytical', 'bootstrap')
'analytical'

Returns:

Type Description
MediationDecompResult with NDE, NIE, CDE, proportion mediated.
Notes

Under the purely linear model used here, the controlled direct effect CDE(m*) evaluated at the reference level m* = E[M | A=0] coincides numerically with the natural direct effect (NDE). The cde field is therefore redundant in this implementation — it is retained for API compatibility with VanderWeele's four-way decomposition, but users should not treat it as independent information from nde unless a nonlinear or interaction-heterogeneous extension is added.

disparity_decompose

disparity_decompose(data: DataFrame, y: str, group: str, mediator: str, covariates: Optional[Sequence[str]] = None, target_level: Optional[float] = None) -> DisparityDecompResult

Jackson & VanderWeele (2018) causal disparity decomposition.

Decomposes an observed group disparity in Y into: - initial disparity: what would remain if mediator M were set to a reference level (e.g. Group A's M distribution). - mediator-attributable: the complementary share.

Parameters:

Name Type Description Default
data DataFrame
required
y str — outcome
required
group str — binary group (0/1, where 1 = disadvantaged)
required
mediator str — mediator
required
covariates list or None
None
target_level float or None

Value at which to fix mediator for the "initial" counterfactual. Default: mean of M in reference group (group=0).

None

yu_elwert_decompose

yu_elwert_decompose(data: DataFrame, y: str, treatment: str, group: str, x: Sequence[str], *, method: str = 'plugin', inference: str = 'bootstrap', n_boot: int = 499, alpha: float = 0.05, trim: float = 0.005, cluster: Optional[str] = None, seed: Optional[int] = 12345) -> YuElwertResult

Nonparametric causal decomposition of a group disparity.

Parameters:

Name Type Description Default
data DataFrame

Long-format panel with one row per observation.

required
y str

Name of the (continuous) outcome column.

required
treatment str

Binary treatment indicator (0/1).

required
group str

Binary group indicator (0/1) — 1 = advantaged / index group.

required
x sequence of str

Adjustment covariates (used to identify within-group CATEs).

required
method ('plugin', 'efficient')

"plugin" uses within-cell OLS for outcomes and within-group logit for the propensity and computes plug-in expectations (Yu-Elwert 2025, Section 4.1). "efficient" augments each moment with the doubly-robust correction term — recommended when nuisance functions might be misspecified.

"plugin"
inference ('bootstrap', 'none')

"bootstrap" returns SEs and percentile CIs from the non-parametric (cluster-aware) bootstrap. "none" skips inference.

"bootstrap"
n_boot int
499
alpha float

Two-sided coverage level.

0.05
trim float

Lower/upper clip for fitted propensities (only used in method="efficient").

0.005
cluster str or None

Column name to use for cluster bootstrap.

None
seed int or None
12345

Returns:

Type Description
YuElwertResult
Notes

Identification requires conditional ignorability of treatment given (R, X) (no unmeasured confounders within group). The framework does not require R itself to be unconfounded, distinguishing it from causal-mediation approaches.

The "selection" component is zero whenever individuals are randomly assigned to treatment (no selection on individual gain) or whenever the CATE is constant within group (no heterogeneity to select on). A non-zero selection term — particularly one of opposite sign in the two groups — flags that targeting differs systematically across groups, often the lever a designer can pull.

Examples:

>>> import statspai as sp
>>> df = sp.decomposition.datasets.disparity_panel()
>>> r = sp.decompose(
...     "yu_elwert", data=df, y="y", treatment="t", group="r",
...     x=["x1", "x2"]
... )
>>> r.summary()

decompose

decompose(method: str, /, **kwargs) -> Any

Unified entry point for all decomposition methods.

Parameters:

Name Type Description Default
method str

One of the methods listed in available_methods(). Aliases are supported (e.g. 'mm' → 'machado_mata').

required
**kwargs method-specific keyword arguments (see individual

function signatures for details).

{}

Returns:

Type Description
method-specific result class with ``.summary()``, ``.plot()``,
``.to_latex()``, ``._repr_html_()``.

Examples:

>>> import statspai as sp
>>> df = sp.decomposition.datasets.cps_wage()
>>> r = sp.decompose('oaxaca', data=df, y='log_wage', group='female',
...                  x=['education', 'experience', 'tenure'])
>>> r.summary()
>>> r = sp.decompose('ffl', data=df, y='log_wage', group='female',
...                  x=['education', 'experience', 'tenure'],
...                  stat='quantile', tau=0.5)
>>> r.summary()
>>> # NOTE: ``method='aipw'`` below is passed through to
>>> # ``gap_closing``'s own ``method`` parameter; the dispatcher's
>>> # own method arg is positional-only so there is no collision.
>>> r = sp.decompose('gap_closing', data=df, y='log_wage',
...                  group='female',
...                  x=['education', 'experience', 'tenure'],
...                  method='aipw')

Convention warning for dfl vs machado_mata / melly / cfm: reference=0 has different semantics across method families (reweighting vs coefficient-swap). See the per-method docstrings before comparing composition/structure estimates across methods.

available_methods

available_methods() -> list[str]

Return list of all registered decomposition method names.

cps_wage

cps_wage(n: int = 3000, seed: Optional[int] = 42) -> DataFrame

CPS-style wage data with a gender gap.

Columns: - female : int {0, 1} - education : years - experience : years - tenure : years - union : int {0, 1} - married : int {0, 1} - log_wage : float

chilean_households

chilean_households(n: int = 2500, seed: Optional[int] = 42) -> DataFrame

Chilean-style household income with urban/rural gap.

Columns: - rural : int {0, 1} - head_education : years - head_age : years - household_size : int - log_income : float

mincer_wage_panel

mincer_wage_panel(n: int = 5000, seed: Optional[int] = 42) -> DataFrame

Two-period Mincer wage distribution with a structural shift.

Useful for DFL / FFL examples where Group 0 is early period and Group 1 is late period.

Columns: - period : int {0, 1} - education : years - experience : years - union : int - occupation_high_skill : int - log_wage : float

disparity_panel

disparity_panel(n: int = 3000, seed: Optional[int] = 42) -> DataFrame

Synthetic disparity panel with treatment, mediator, outcome.

Columns: - group : int {0, 1} (disadvantaged = 1) - education : years (mediator) - parent_income : float (confounder) - age : years (confounder) - income : float (outcome)