statspai.decomposition¶
decomposition ¶
Decomposition Analysis module for StatsPAI.
Decomposition toolkit covering mean, distributional, inequality,
demographic, and causal decomposition methods under a
unified API: sp.decompose(method=...).
Methods (19 in total — Yu-Elwert added in v1.15)
Mean decomposition
- oaxaca — Blinder-Oaxaca (Blinder 1973; Oaxaca 1973) with 5
reference-coefficient choices (A, B, pooled/Neumark, Cotton, Reimers)
- gelbach — Gelbach (2016) sequential orthogonal decomposition of
omitted-variable bias
- fairlie — Fairlie (2005) nonlinear decomposition for logit/probit
- bauer_sinning / yun_nonlinear — Bauer-Sinning (2008) + Yun
(2005) detailed nonlinear decomposition
Distributional decomposition
- rif — Recentered Influence Function regression + OB decomposition
(Firpo-Fortin-Lemieux 2009)
- ffl — Firpo-Fortin-Lemieux (2018) two-step detailed decomposition
- dfl — DiNardo-Fortin-Lemieux (1996) reweighting
- machado_mata — Machado-Mata (2005) quantile decomposition
- melly — Melly (2005) analytical quantile decomposition
- cfm — Chernozhukov-Fernández-Val-Melly (2013) counterfactual
distributions via distribution regression
Inequality decomposition
- subgroup — between/within decomposition (Theil T/L, GE, Gini,
Atkinson, CV²)
- shapley_inequality — Shorrocks-Shapley (2013) allocation of
inequality to covariates
- gini_source — Lerman-Yitzhaki (1985) Gini source decomposition
Demographic / standardisation
- kitagawa — Kitagawa (1955) two-factor rate decomposition
- das_gupta — Das Gupta (1993) multi-factor decomposition
Causal decomposition
- gap_closing — Lundberg (2022) gap-closing estimator
(regression / IPW / AIPW)
- mediation — VanderWeele (2014) natural direct/indirect effects
- disparity / causal_jvw — Jackson-VanderWeele (2018) causal
disparity decomposition
- yu_elwert — Yu & Elwert (2025) nonparametric causal decomposition
of group disparities into baseline, prevalence, effect, and selection
components (efficient-influence-function-based; ML-friendly)
Unified Entry
sp.decompose(method=..., **kwargs) dispatches to any of the above.
Polish (v1.15)
Every result class now inherits DecompResultMixin, exposing a
common .confint(), .cite(), .to_dict(), .to_json(),
.to_excel(), and .to_word() surface in addition to each
method's bespoke .summary() / .plot() / .to_latex().
Plots share a common palette and minimalist style via
:mod:statspai.decomposition.plots (forest plots, mediation forest,
Yu-Elwert mechanism plot, RIF heatmap, …).
OaxacaResult ¶
Bases: DecompResultMixin
Result container for Oaxaca-Blinder decomposition.
Attributes:
| Name | Type | Description |
|---|---|---|
overall |
dict
|
Keys: |
detailed |
DataFrame
|
Variable-level decomposition with columns |
group_stats |
dict
|
Per-group means, coefficients, standard errors, sample sizes. |
reference |
str or int
|
Reference weight specification used. |
plot ¶
Bar / forest chart of per-variable explained contributions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
('waterfall', 'forest')
|
|
"waterfall"
|
GelbachResult ¶
Bases: DecompResultMixin
Result container for Gelbach (2016) decomposition.
Attributes:
| Name | Type | Description |
|---|---|---|
total_change |
float
|
Total change in the base coefficient when added controls are included: beta_base - beta_full. |
decomposition |
DataFrame
|
Per-variable contributions with columns |
base_coef |
float
|
Coefficient of interest from the base (short) regression. |
full_coef |
float
|
Coefficient of interest from the full (long) regression. |
base_var |
str
|
Name of the variable of interest. |
DFLResult
dataclass
¶
Bases: DecompResultMixin
Container for DFL reweighting decomposition results.
FFLResult
dataclass
¶
Bases: DecompResultMixin
Container for Firpo-Fortin-Lemieux two-step decomposition.
MachadoMataResult
dataclass
¶
Bases: DecompResultMixin
Container for Machado-Mata decomposition.
MellyResult
dataclass
¶
Bases: DecompResultMixin
Container for Melly quantile decomposition.
CFMResult
dataclass
¶
Bases: DecompResultMixin
Chernozhukov-Fernández-Val-Melly counterfactual distribution result.
YuElwertResult
dataclass
¶
Bases: DecompResultMixin
Yu-Elwert (2025) causal decomposition of a group disparity.
Attributes:
| Name | Type | Description |
|---|---|---|
disparity |
float
|
Observed gap |
baseline |
float
|
Counterfactual disparity if no one were treated. |
prevalence |
float
|
Contribution of differential treatment uptake (group A vs. B), scaled by the reference group's average treatment effect. |
effect |
float
|
Contribution of group heterogeneity in average treatment effects, scaled by the advantaged group's treatment prevalence. |
selection |
float
|
Group-specific covariance between treatment assignment and individual-level effect heterogeneity — the signature mechanism of Yu-Elwert. |
se |
dict[str, float] | None
|
Standard errors keyed by component ( |
ci |
dict[str, (float, float)] | None
|
Two-sided 95% confidence intervals (matching |
detailed |
DataFrame
|
Tidy table of (component, value, se, ci_low, ci_high). |
nuisance |
dict
|
Diagnostic snapshot — group sizes, fitted per-cell means,
plus |
method |
str
|
|
gelbach ¶
gelbach(data: DataFrame, y: str, base_x: Sequence[str], added_x: Sequence[str], var_of_interest: Optional[str] = None, alpha: float = 0.05) -> GelbachResult
Gelbach (2016) decomposition of omitted variable bias.
When controls are added to a regression, the coefficient on a variable of interest may change. This function decomposes that change into contributions from each added variable, answering: "Which added controls explain the change, and by how much?"
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
base_x
|
list of str
|
Variables in the base (short) specification. |
required |
added_x
|
list of str
|
Variables added to obtain the full (long) specification. |
required |
var_of_interest
|
str
|
Which base variable's coefficient change to decompose.
Defaults to the first element of |
None
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
GelbachResult
|
Result object with |
Notes
The Gelbach identity:
.. math::
\hat\beta^{\text{base}}_k - \hat\beta^{\text{full}}_k
= \sum_{j \in \text{added}} \tilde\gamma_{kj} \hat\beta^{\text{full}}_j
where :math:\tilde\gamma_{kj} is the coefficient from regressing
added variable j on all base variables (including constant).
Examples:
rifreg ¶
rifreg(formula: str, data: DataFrame, statistic: StatisticKind = 'quantile', tau: float = 0.5) -> RIFResult
RIF regression (Firpo, Fortin & Lemieux 2009).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
|
required |
data
|
DataFrame
|
|
required |
statistic
|
('quantile', 'variance', 'gini')
|
|
"quantile"
|
tau
|
float
|
Quantile level (default 0.5 = median UQPE). |
0.5
|
rif_decomposition ¶
rif_decomposition(formula: str, data: DataFrame, group: str, statistic: StatisticKind = 'quantile', tau: float = 0.5, reference: int = 0) -> RIFDecompositionResult
RIF Oaxaca-Blinder decomposition (FFL 2009, Section 5).
Decomposes the between-group difference in a distributional statistic into explained (covariate endowment) and unexplained (coefficient) parts at the chosen statistic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
group
|
str
|
Binary (0/1) group indicator column. |
required |
reference
|
int
|
Which group's coefficients to use as the reference (0 or 1). |
0
|
rif_values ¶
Compute the RIF of each observation.
Delegates to :func:statspai.decomposition._common.influence_function,
which is the canonical implementation shared with the FFL two-step
decomposition. Supported statistics (expanded in this release):
``quantile`` (with ``tau``), ``mean``, ``variance``, ``std``,
``log_var``, ``iqr``, ``gini``, ``theil_t``, ``theil_l``,
``atkinson`` (ε = 1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
(n,) array
|
|
required |
statistic
|
str
|
|
'quantile'
|
tau
|
float
|
Quantile level (only used when |
0.5
|
dfl_decompose ¶
dfl_decompose(data: DataFrame, y: str, group: str, x: Sequence[str], stat: str = 'mean', tau: float = 0.5, reference: int = 0, weights: Optional[Union[str, ndarray]] = None, trim: float = 0.001, inference: str = 'analytical', n_boot: int = 299, alpha: float = 0.05, quantile_grid: Optional[Sequence[float]] = None, seed: Optional[int] = 12345) -> DFLResult
DFL (1996) reweighting decomposition at a chosen distributional statistic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str — outcome variable name
|
|
required |
group
|
str — binary (0/1) group indicator
|
|
required |
x
|
Sequence[str] — covariates used for propensity model
|
|
required |
stat
|
('mean', 'variance', 'std', 'quantile', 'iqr', 'gini', 'log_var')
|
|
'mean'
|
tau
|
float — quantile level (when stat='quantile')
|
|
0.5
|
reference
|
(0, 1)
|
.. warning::
|
0
|
weights
|
str, array or None — sample weights
|
|
None
|
trim
|
float — clip propensity scores to [trim, 1-trim]
|
|
0.001
|
inference
|
('none', 'bootstrap', 'analytical')
|
|
'none'
|
n_boot
|
int — bootstrap replications
|
|
299
|
alpha
|
float — CI level
|
|
0.05
|
quantile_grid
|
sequence of τ ∈ (0, 1) or None
|
If provided, also compute quantile-process decomposition on this grid. |
None
|
seed
|
int or None
|
|
12345
|
Returns:
| Type | Description |
|---|---|
DFLResult
|
|
ffl_decompose ¶
ffl_decompose(data: DataFrame, y: str, group: str, x: Sequence[str], stat: str = 'quantile', tau: float = 0.5, reference: int = 0, weights: Optional[Union[str, ndarray]] = None, trim: float = 0.001, inference: str = 'analytical', n_boot: int = 299, alpha: float = 0.05, seed: Optional[int] = 12345) -> FFLResult
Firpo-Fortin-Lemieux two-step detailed distributional decomposition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
group
|
str — binary {0, 1}
|
|
required |
x
|
Sequence[str]
|
|
required |
stat
|
{'quantile', 'mean', 'variance', 'std', 'iqr', 'gini',
|
|
'quantile'
|
tau
|
float (for quantile)
|
|
0.5
|
reference
|
int {0, 1}
|
0: B reweighted to look like A's X (composition = effect of A's X on B's outcomes relative to observed B) |
0
|
weights
|
(str, array or None)
|
|
None
|
trim
|
float — propensity trim
|
|
0.001
|
inference
|
('analytical', 'bootstrap', 'none')
|
|
'analytical'
|
n_boot
|
int
|
|
299
|
alpha
|
float
|
|
0.05
|
seed
|
int or None
|
|
12345
|
melly_decompose ¶
melly_decompose(data: DataFrame, y: str, group: str, x: Sequence[str], tau_grid: Optional[Sequence[float]] = None, reference: int = 0, n_tau_qr: int = 99) -> MellyResult
Melly (2005) quantile decomposition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
column names
|
|
required |
group
|
column names
|
|
required |
x
|
column names
|
|
required |
tau_grid
|
Sequence[float] or None — reporting τ grid
|
|
None
|
reference
|
(0, 1)
|
Same convention as |
0
|
n_tau_qr
|
int — QR estimation grid resolution
|
|
99
|
Returns:
| Type | Description |
|---|---|
MellyResult
|
|
cfm_decompose ¶
cfm_decompose(data: DataFrame, y: str, group: str, x: Sequence[str], tau_grid: Optional[Sequence[float]] = None, reference: int = 0, n_thresh: int = 40, ks_test: bool = True) -> CFMResult
Chernozhukov-Fernández-Val-Melly (2013) counterfactual decomposition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
column names
|
|
required |
group
|
column names
|
|
required |
x
|
column names
|
|
required |
tau_grid
|
Sequence[float] or None
|
|
None
|
reference
|
(0, 1)
|
Same convention as |
0
|
n_thresh
|
int — number of thresholds for distribution regression
|
|
40
|
ks_test
|
bool — whether to compute Kolmogorov-Smirnov gap test
|
|
True
|
Returns:
| Type | Description |
|---|---|
CFMResult
|
|
fairlie ¶
fairlie(data: DataFrame, y: str, group: str, x: Sequence[str], model: str = 'logit', reference: int = 0, n_sim: int = 500, seed: Optional[int] = 12345) -> NonlinearDecompResult
Fairlie (2005) nonlinear decomposition for binary outcomes.
Procedure: fit model on reference group; rank-match one group onto the other; compute mean predicted probability under counterfactual X; variable-level contribution = change in mean prediction when that variable is swapped to the other group's value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str — binary {0, 1}
|
|
required |
group
|
str — binary
|
|
required |
x
|
Sequence[str]
|
|
required |
model
|
('logit', 'probit')
|
|
'logit'
|
reference
|
(0, 1)
|
|
0
|
n_sim
|
int — number of random matchings to average over
|
|
500
|
seed
|
int or None
|
|
12345
|
bauer_sinning ¶
bauer_sinning(data: DataFrame, y: str, group: str, x: Sequence[str], model: str = 'logit', reference: int = 0, variant: str = 'yun') -> NonlinearDecompResult
Bauer-Sinning (2008) nonlinear Oaxaca-Blinder decomposition with Yun (2004, 2005) weights for detailed contributions.
Implements the three-fold equivalent: gap = [E(p_a(X_a)) - E(p_r(X_a))] # not used here but uses Yun's weight decomposition: explained_j = w_j · (E(p_r(X_a)) - E(p_r(X_b))) where w_j = (Δx̄_j · β_r_j) / Σ_k (Δx̄_k · β_r_k)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
('logit', 'probit')
|
|
'logit'
|
reference
|
(0, 1)
|
|
0
|
variant
|
'yun'
|
|
'yun'
|
inequality_index ¶
inequality_index(y: ndarray, index: str = 'theil_t', weights: Optional[ndarray] = None, eps: float = 1.0, alpha: Optional[float] = None) -> float
Compute a single inequality index.
subgroup_decompose ¶
subgroup_decompose(data: DataFrame, y: str, by: str, index: str = 'theil_t', weights: Optional[Union[str, ndarray]] = None, eps: float = 1.0, alpha: Optional[float] = None) -> SubgroupDecompResult
Subgroup decomposition (between / within) of an inequality index.
Supported for additive GE family (theil_t, theil_l, mld, ge0, ge1, ge2, cv2, atkinson(ε=1)). Gini returns Dagum (1997) Gini_B / Gini_W / Gini_overlap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str — outcome
|
|
required |
by
|
str — grouping variable
|
|
required |
index
|
str — inequality index name
|
|
'theil_t'
|
weights
|
(str, array or None)
|
|
None
|
eps
|
float — Atkinson parameter
|
|
1.0
|
alpha
|
float or None — GE parameter override
|
|
None
|
source_decompose ¶
source_decompose(data: DataFrame, sources: Sequence[str], weights: Optional[Union[str, ndarray]] = None) -> SourceDecompResult
Lerman-Yitzhaki (1985) Gini source decomposition.
Total income = Σ sources. Each source's contribution is S_k · R_k · G_k / G_total where S_k is its share of total mean, R_k the Gini correlation with total rank, G_k its own Gini.
shapley_inequality ¶
shapley_inequality(data: DataFrame, y: str, x: Sequence[str], index: str = 'theil_t', weights: Optional[Union[str, ndarray]] = None) -> ShapleyInequalityResult
Shorrocks-Shapley decomposition of an inequality index across covariates.
For each subset S ⊆ covariates, compute the predicted outcome ŷ_S = X_S · β_S (OLS) and evaluate index I(ŷ_S). The marginal contribution of variable j to I is averaged over all orderings yielding its Shapley value φ_j.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
as usual
|
|
required |
y
|
as usual
|
|
required |
x
|
as usual
|
|
required |
index
|
str
|
|
'theil_t'
|
weights
|
(str, array or None)
|
|
None
|
Notes
Combinatorial cost: O(2^|x|). For |x| ≤ 10 this is fine; for larger x the function warns and uses a random permutation sampler.
kitagawa_decompose ¶
kitagawa_decompose(data: DataFrame, rate: str, group: str, by: Union[str, Sequence[str]], weights: Optional[str] = None, normalize: str = 'symmetric') -> KitagawaResult
Kitagawa (1955) two-factor rate decomposition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Tidy data. Either individual-level (aggregated internally) or
pre-aggregated cell-level with columns: |
required |
rate
|
str
|
Column holding the category-specific rate (or 0/1 outcome at the individual level). |
required |
group
|
str
|
Binary group indicator. |
required |
by
|
str or list of str
|
Category variable(s) defining cells. |
required |
weights
|
str or None
|
Cell population weights. If None, each row treated as individual-level data (weight = 1). |
None
|
normalize
|
('symmetric', 'a', 'b')
|
|
'symmetric'
|
das_gupta ¶
Das Gupta (1993) multi-factor decomposition.
Decomposes the difference in a product-form aggregate into each factor's contribution using symmetric averaging across all possible orderings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_a
|
pd.DataFrame with the same factor columns.
|
Each row contributes the factor value. The aggregate for each group is computed as Σ_i ∏f factor{f,i}. For single-row DataFrames (one population, no stratification) the aggregate is simply ∏_f factor_f. |
required |
data_b
|
pd.DataFrame with the same factor columns.
|
Each row contributes the factor value. The aggregate for each group is computed as Σ_i ∏f factor{f,i}. For single-row DataFrames (one population, no stratification) the aggregate is simply ∏_f factor_f. |
required |
factor_names
|
list of factor column names.
|
|
required |
Notes
Assumes: rate = f_1 * f_2 * ... * f_m (aggregate product form).
For additive forms use kitagawa_decompose.
gap_closing ¶
gap_closing(data: DataFrame, y: str, group: str, x: Sequence[str], method: str = 'aipw', target_dist: int = 1, trim: float = 0.001, inference: str = 'analytical', n_boot: int = 299, alpha: float = 0.05, seed: Optional[int] = 12345) -> GapClosingResult
Lundberg (2021) gap-closing estimator.
Computes the counterfactual mean gap that would remain if one group's covariate distribution were shifted to match the other's.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
column names
|
|
required |
group
|
column names
|
|
required |
x
|
column names
|
|
required |
method
|
('regression', 'ipw', 'aipw')
|
AIPW is doubly robust (recommended). |
'regression'
|
target_dist
|
(0, 1)
|
|
0
|
trim
|
float — propensity trim
|
|
0.001
|
inference
|
('analytical', 'bootstrap', 'none')
|
|
'analytical'
|
mediation_decompose ¶
mediation_decompose(data: DataFrame, y: str, treatment: str, mediator: str, covariates: Optional[Sequence[str]] = None, inference: str = 'analytical', n_boot: int = 299, alpha: float = 0.05, seed: Optional[int] = 12345) -> MediationDecompResult
Linear nested-models mediation decomposition (VanderWeele 2014 four-way simplified to natural direct / indirect under linearity).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str — continuous outcome
|
|
required |
treatment
|
str — binary exposure
|
|
required |
mediator
|
str — mediator
|
|
required |
covariates
|
list of str or None
|
|
None
|
inference
|
('analytical', 'bootstrap')
|
|
'analytical'
|
Returns:
| Type | Description |
|---|---|
MediationDecompResult with NDE, NIE, CDE, proportion mediated.
|
|
Notes
Under the purely linear model used here, the controlled direct
effect CDE(m*) evaluated at the reference level m* = E[M | A=0]
coincides numerically with the natural direct effect (NDE). The
cde field is therefore redundant in this implementation — it is
retained for API compatibility with VanderWeele's four-way
decomposition, but users should not treat it as independent
information from nde unless a nonlinear or
interaction-heterogeneous extension is added.
disparity_decompose ¶
disparity_decompose(data: DataFrame, y: str, group: str, mediator: str, covariates: Optional[Sequence[str]] = None, target_level: Optional[float] = None) -> DisparityDecompResult
Jackson & VanderWeele (2018) causal disparity decomposition.
Decomposes an observed group disparity in Y into: - initial disparity: what would remain if mediator M were set to a reference level (e.g. Group A's M distribution). - mediator-attributable: the complementary share.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str — outcome
|
|
required |
group
|
str — binary group (0/1, where 1 = disadvantaged)
|
|
required |
mediator
|
str — mediator
|
|
required |
covariates
|
list or None
|
|
None
|
target_level
|
float or None
|
Value at which to fix mediator for the "initial" counterfactual. Default: mean of M in reference group (group=0). |
None
|
yu_elwert_decompose ¶
yu_elwert_decompose(data: DataFrame, y: str, treatment: str, group: str, x: Sequence[str], *, method: str = 'plugin', inference: str = 'bootstrap', n_boot: int = 499, alpha: float = 0.05, trim: float = 0.005, cluster: Optional[str] = None, seed: Optional[int] = 12345) -> YuElwertResult
Nonparametric causal decomposition of a group disparity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel with one row per observation. |
required |
y
|
str
|
Name of the (continuous) outcome column. |
required |
treatment
|
str
|
Binary treatment indicator (0/1). |
required |
group
|
str
|
Binary group indicator (0/1) — |
required |
x
|
sequence of str
|
Adjustment covariates (used to identify within-group CATEs). |
required |
method
|
('plugin', 'efficient')
|
|
"plugin"
|
inference
|
('bootstrap', 'none')
|
|
"bootstrap"
|
n_boot
|
int
|
|
499
|
alpha
|
float
|
Two-sided coverage level. |
0.05
|
trim
|
float
|
Lower/upper clip for fitted propensities (only used in
|
0.005
|
cluster
|
str or None
|
Column name to use for cluster bootstrap. |
None
|
seed
|
int or None
|
|
12345
|
Returns:
| Type | Description |
|---|---|
YuElwertResult
|
|
Notes
Identification requires conditional ignorability of treatment given
(R, X) (no unmeasured confounders within group). The framework
does not require R itself to be unconfounded, distinguishing
it from causal-mediation approaches.
The "selection" component is zero whenever individuals are randomly assigned to treatment (no selection on individual gain) or whenever the CATE is constant within group (no heterogeneity to select on). A non-zero selection term — particularly one of opposite sign in the two groups — flags that targeting differs systematically across groups, often the lever a designer can pull.
Examples:
decompose ¶
Unified entry point for all decomposition methods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
One of the methods listed in |
required |
**kwargs
|
method-specific keyword arguments (see individual
|
function signatures for details). |
{}
|
Returns:
| Type | Description |
|---|---|
method-specific result class with ``.summary()``, ``.plot()``,
|
|
``.to_latex()``, ``._repr_html_()``.
|
|
Examples:
>>> import statspai as sp
>>> df = sp.decomposition.datasets.cps_wage()
>>> r = sp.decompose('oaxaca', data=df, y='log_wage', group='female',
... x=['education', 'experience', 'tenure'])
>>> r.summary()
>>> r = sp.decompose('ffl', data=df, y='log_wage', group='female',
... x=['education', 'experience', 'tenure'],
... stat='quantile', tau=0.5)
>>> r.summary()
>>> # NOTE: ``method='aipw'`` below is passed through to
>>> # ``gap_closing``'s own ``method`` parameter; the dispatcher's
>>> # own method arg is positional-only so there is no collision.
>>> r = sp.decompose('gap_closing', data=df, y='log_wage',
... group='female',
... x=['education', 'experience', 'tenure'],
... method='aipw')
Convention warning for dfl vs machado_mata / melly /
cfm: reference=0 has different semantics across method
families (reweighting vs coefficient-swap). See the per-method
docstrings before comparing composition/structure estimates across
methods.
available_methods ¶
Return list of all registered decomposition method names.
cps_wage ¶
CPS-style wage data with a gender gap.
Columns: - female : int {0, 1} - education : years - experience : years - tenure : years - union : int {0, 1} - married : int {0, 1} - log_wage : float
chilean_households ¶
Chilean-style household income with urban/rural gap.
Columns: - rural : int {0, 1} - head_education : years - head_age : years - household_size : int - log_income : float
mincer_wage_panel ¶
Two-period Mincer wage distribution with a structural shift.
Useful for DFL / FFL examples where Group 0 is early period and Group 1 is late period.
Columns: - period : int {0, 1} - education : years - experience : years - union : int - occupation_high_skill : int - log_wage : float
disparity_panel ¶
Synthetic disparity panel with treatment, mediator, outcome.
Columns: - group : int {0, 1} (disadvantaged = 1) - education : years (mediator) - parent_income : float (confounder) - age : years (confounder) - income : float (outcome)