Skip to content

statspai.bayes

bayes

Bayesian causal inference (statspai.bayes).

PyMC-backed Bayesian estimators for the canonical causal designs. PyMC and ArviZ are optional dependencies — importing this sub-package never imports them. Each estimator resolves PyMC at call time and raises :class:ImportError with the install recipe if the extras are missing.

Install with:

pip install "statspai[bayes]"

Available estimators:

  • :func:bayes_did — 2×2 and panel difference-in-differences with optional hierarchical random effects on unit / time.
  • :func:bayes_rd — sharp regression discontinuity with local polynomial + Normal prior on the jump.

BayesianCausalResult dataclass

Summary of a Bayesian causal fit.

A sibling of :class:statspai.core.results.CausalResult that speaks posterior (HDI, posterior probabilities, convergence diagnostics) instead of frequentist (CI, p-value).

Attributes:

Name Type Description
method str

Human-readable method tag, e.g. "Bayesian DID (panel)".

estimand str

Name of the causal estimand, e.g. "ATT" / "LATE".

posterior_mean, posterior_median, posterior_sd float

Central tendency + dispersion of the posterior on the causal parameter.

hdi_lower, hdi_upper float

Endpoints of the 95 % highest density interval.

prob_positive float

Posterior probability that the estimand is > 0.

prob_rope float | None

Posterior probability that the estimand lies in a user-supplied region of practical equivalence.

rhat float

Gelman-Rubin potential scale reduction factor. Warn if

1.01.

ess float

Effective sample size (bulk).

n_obs int

Sample size used in the fit.

hdi_prob float

Nominal HDI coverage (default 0.95).

trace InferenceData | None

Full posterior trace for downstream plotting / diagnostics.

model_info dict

Misc. fit metadata (draws, tune, chains, priors, ...).

tidy

tidy(conf_level: Optional[float] = None) -> DataFrame

Single-row DataFrame: term, estimate, std_error, HDI endpoints.

Parameters:

Name Type Description Default
conf_level float

Not used for Bayesian output (HDI has its own level set at fit time). Accepted for API parity with CausalResult.tidy.

None

glance

glance() -> DataFrame

Single-row DataFrame with fit-level diagnostics.

summary

summary() -> str

Printable summary.

BayesianDIDResult dataclass

Bases: BayesianCausalResult

Bayesian DID result with optional per-cohort ATT posteriors.

Extends :class:BayesianCausalResult with a cohort_summaries mapping cohort-label -> summary-dict populated by :func:bayes_did when called with cohort=.... When the dict is empty (default), the result behaves exactly like :class:BayesianCausalResult and tidy() produces the same single-row DataFrame as before.

The cohort_labels field preserves the user's original cohort values in the order the model sampled them; iteration order is deterministic and matches the posterior variable axis.

tidy

tidy(conf_level: Optional[float] = None, terms: Any = None) -> DataFrame

Tidy summary with optional per-cohort breakdown.

Parameters:

Name Type Description Default
conf_level float

Unused for Bayesian output; accepted for API parity with :class:CausalResult.tidy.

None
terms None | str | sequence of str
  • None / 'att' — single average-ATT row (v0.9.14 behaviour).
  • 'per_cohort' — one row per cohort (requires cohort_summaries to be populated; otherwise raises).
  • list like ['att', 'cohort:2019', 'cohort:2020'] — explicit selection. Cohort labels use the prefix 'cohort:' followed by the user's original cohort value coerced to string. Unknown cohort labels raise.
``None``

BayesianHTEIVResult dataclass

Bases: BayesianCausalResult

Extension of :class:BayesianCausalResult for heterogeneous-effect IV.

Carries the average LATE (inherited) plus a table of CATE-slope posteriors — one row per effect modifier.

predict_cate

predict_cate(values: Dict[str, float]) -> Dict[str, float]

Posterior summary of CATE at specific modifier values.

Parameters:

Name Type Description Default
values dict[str, float]

Map from modifier name → value. Missing modifiers default to the sample mean (i.e. zero contribution after centring).

required

Returns:

Type Description
dict

Keys {'mean', 'median', 'sd', 'hdi_low', 'hdi_high', 'prob_positive'}.

BayesianIVResult dataclass

Bases: BayesianCausalResult

Bayesian IV result with optional per-instrument LATE posteriors.

Extends :class:BayesianCausalResult with an instrument_summaries mapping instrument-name -> summary-dict populated by :func:bayes_iv when called with per_instrument=True. When the dict is empty (default), behaviour matches v0.9.15 exactly.

tidy

tidy(conf_level: Optional[float] = None, terms: Any = None) -> DataFrame

Tidy summary with optional per-instrument breakdown.

Parameters:

Name Type Description Default
terms None | str | sequence of str
  • None / 'late' — single pooled-LATE row.
  • 'per_instrument' — one row per instrument (requires a fit with per_instrument=True).
  • list like ['late', 'instrument:z1', 'instrument:z2'] — explicit selection. Unknown labels raise.
``None``

BayesianMTEResult dataclass

Bases: BayesianCausalResult

Bayesian Marginal Treatment Effect result.

Carries (in addition to the inherited average MTE summary) a full posterior over the MTE curve tau(u) on a user-specified grid, plus integrated summaries over the treated / untreated / average populations (ATT, ATU, ATE).

summary

summary() -> str

Printable summary with ATT/ATU uncertainty appended.

Extends BayesianCausalResult.summary with a block printing ATT / ATU posterior mean, SD and HDI when the SD fields are finite. Silently skipped (parent summary only) when either side of the population is empty or the result was deserialised from a pre-v0.9.13 snapshot (fields default NaN).

tidy

tidy(conf_level: Optional[float] = None, terms: Any = None) -> DataFrame

Broom-style tidy summary with optional multi-term output.

Extends :meth:BayesianCausalResult.tidy so a single fit can emit a long-format DataFrame for ATE / ATT / ATU in one call, which is what downstream meta-analysis pipelines (pd.concat + modelsummary, gt, etc.) expect.

Parameters:

Name Type Description Default
conf_level float

Unused for Bayesian output; accepted for API parity with :class:CausalResult.tidy.

None
terms None | str | sequence of str

Which term(s) to include:

  • None or 'ate' — single ATE row (back-compat with v0.9.14 default).
  • 'att' / 'atu' — single row of that term.
  • list like ['ate', 'att', 'atu'] — multi-row.

Unknown names raise :class:ValueError. When an ATT / ATU term is requested but the corresponding SD field is NaN (empty subpopulation, or result deserialised from pre-v0.9.13), the row is still emitted with NaN uncertainty columns — the schema stays rectangular so downstream concat doesn't misalign columns.

``None``

policy_effect

policy_effect(weight_fn, label: str = 'policy', rope: Optional[Tuple[float, float]] = None) -> Dict[str, float]

Posterior summary of a policy-relevant treatment effect.

Computes E[w(U) * MTE(U)] / E[w(U)] as a posterior quantity by reusing the fit's posterior draws on b_mte and the stored u_grid / poly_u. The numerator and denominator are evaluated on the same grid so integration weights cancel.

Parameters:

Name Type Description Default
weight_fn callable

Vectorised function u -> weights, where u is a numpy array on self.u_grid. Values outside [0, 1] are still valid (the grid dictates the support).

required
label str

Identifier propagated into the returned summary dict.

'policy'
rope (float, float)

Region of practical equivalence for the policy effect.

None

Returns:

Type Description
dict

Keys: label, estimate, std_error, hdi_low, hdi_high, prob_positive, plus prob_rope if rope given.

plot_mte

plot_mte(ax=None, figsize=(8, 5))

Plot the MTE curve with HDI ribbon. Requires matplotlib.

BayesianDMLResult dataclass

Bayesian DML posterior summary.

bayes_did

bayes_did(data: DataFrame, y: str, treat: str, post: str, unit: Optional[str] = None, time: Optional[str] = None, covariates: Optional[List[str]] = None, *, cohort: Optional[str] = None, prior_ate: Tuple[float, float] = (0.0, 10.0), prior_unit_sigma: float = 5.0, prior_time_sigma: float = 5.0, prior_noise: float = 5.0, prior_covariate_sigma: float = 10.0, rope: Optional[Tuple[float, float]] = None, hdi_prob: float = 0.95, inference: str = 'nuts', advi_iterations: int = 20000, draws: int = 2000, tune: int = 1000, chains: int = 4, target_accept: float = 0.9, random_state: int = 42, progressbar: bool = False) -> BayesianDIDResult

Bayesian difference-in-differences via PyMC.

Two model shapes:

  • 2×2 (no unit/time): y = a + b1*treat + b2*post + tau*treat*post + X*beta + eps. Both group effects are Normal(0, prior_covariate_sigma).

  • Panel (unit or time supplied): hierarchical Gaussian random effects replace the dummies. ATT is the coefficient on treat * post.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome column.

required
treat str

Binary 0/1 indicator for "ever-treated".

required
post str

Binary 0/1 indicator for the post-treatment period.

required
unit str

Panel indices. If supplied, replaces the corresponding main effect with a hierarchical Gaussian random effect. Omit for the 2×2 model.

None
time str

Panel indices. If supplied, replaces the corresponding main effect with a hierarchical Gaussian random effect. Omit for the 2×2 model.

None
covariates list of str

Additional time-varying regressors. Standardised inside PyMC via Normal(0, prior_covariate_sigma) priors.

None
cohort str

Column name identifying each treated unit's cohort (typically the first-treatment period in a staggered design). When supplied, the coefficient on treat*post is replaced by a per-cohort vector tau_c with a shared Normal(prior_ate) prior; the result populates :attr:BayesianDIDResult.cohort_summaries so tidy(terms='per_cohort') returns one row per cohort. Untreated / never-treated units should carry a sentinel cohort value (e.g. -1, 'never') — they are grouped into a single cohort whose τ is still estimated but typically shrinks toward zero because their treat*post is identically zero. The top-level posterior ATT is the size-weighted mean of the cohort τ's over treated units.

None
prior_ate (float, float)

Mean and SD of the Normal prior on tau.

``(0.0, 10.0)``
prior_unit_sigma float

Half-Normal scale on the random-effect SDs.

5.0
prior_time_sigma float

Half-Normal scale on the random-effect SDs.

5.0
prior_noise float

Half-Normal scale on the residual SD.

5.0
prior_covariate_sigma float

Normal SD on fixed effects and covariate slopes.

10.0
rope (float, float)

Region of practical equivalence for the ATT. If supplied the result includes prob_rope = P(lo < tau < hi | data).

None
hdi_prob float
0.95
draws int
2000
tune int
2000
chains int
2000
target_accept float

Higher value if you see divergences.

0.9
random_state int
42
progressbar bool

PyMC progress bar — off by default because this is often called inside notebooks / scripts and noise hurts readability.

False

Returns:

Type Description
BayesianCausalResult

Posterior summary of the ATT.

bayes_rd

bayes_rd(data: DataFrame, y: str, running: str, cutoff: float = 0.0, bandwidth: Optional[float] = None, poly: int = 1, *, prior_tau: Tuple[float, float] = (0.0, 10.0), prior_slope_sigma: float = 10.0, prior_noise: float = 5.0, rope: Optional[Tuple[float, float]] = None, hdi_prob: float = 0.95, inference: str = 'nuts', advi_iterations: int = 20000, draws: int = 2000, tune: int = 1000, chains: int = 4, target_accept: float = 0.9, random_state: int = 42, progressbar: bool = False) -> BayesianCausalResult

Bayesian sharp regression discontinuity.

Within the bandwidth [cutoff - bw, cutoff + bw] the outcome is modelled as a polynomial (default order 1) in running - cutoff with independent slopes on each side. The causal parameter is the jump tau at the cutoff, with a weakly informative Normal prior.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome column.

required
running str

Running / forcing variable column.

required
cutoff float

Threshold above which treatment applies (running >= cutoff).

0.0
bandwidth float

Half-width of the local window around cutoff. If None, uses the rule-of-thumb 0.5 * std(running).

None
poly int

Polynomial order (1 = local linear; 2 = local quadratic). Stata's rdrobust defaults to 1.

1
prior_tau (float, float)

Mean / SD of the Normal prior on the discontinuity.

``(0.0, 10.0)``
prior_slope_sigma float

Prior scales for the polynomial slopes and residual SD.

10.0
prior_noise float

Prior scales for the polynomial slopes and residual SD.

10.0
rope (float, float)

Region of practical equivalence.

None
hdi_prob float
0.95
draws see

:func:bayes_did.

2000
tune see

:func:bayes_did.

2000
chains see

:func:bayes_did.

2000
target_accept see

:func:bayes_did.

2000
random_state see

:func:bayes_did.

2000
progressbar see

:func:bayes_did.

2000

Returns:

Type Description
BayesianCausalResult

Posterior summary of the local average treatment effect (LATE) at the cutoff.

bayes_iv

bayes_iv(data: DataFrame, y: str, treat: str, instrument: Union[str, Sequence[str]], covariates: Optional[List[str]] = None, *, per_instrument: bool = False, prior_late: Tuple[float, float] = (0.0, 10.0), prior_first_stage_sigma: float = 5.0, prior_coef_sigma: float = 10.0, prior_noise: float = 5.0, rope: Optional[Tuple[float, float]] = None, hdi_prob: float = 0.95, inference: str = 'nuts', advi_iterations: int = 20000, draws: int = 2000, tune: int = 1000, chains: int = 4, target_accept: float = 0.9, random_state: int = 42, progressbar: bool = False) -> BayesianIVResult

Bayesian linear IV via jointly-modelled first stage + structural equation.

The model:

.. code-block:: text

D_i = pi_0 + pi_Z' * Z_i + pi_X' * X_i + v_i
Y_i = alpha + LATE * D_i + beta_X' * X_i + eps_i
(v_i, eps_i) ~ BivariateNormal(0, Sigma)

Sigma is parameterised via an LKJ prior on the correlation matrix and HalfNormal priors on the two scales, which lets the model identify the LATE from exogenous variation in Z even when D is endogenous. Under a weak instrument (pi_Z ≈ 0) the LATE posterior correctly widens — there's no "weak-instrument F < 10" footgun here; the posterior just gets more uncertain.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome column.

required
treat str

Endogenous treatment / regressor column (continuous or binary).

required
instrument str or sequence of str

One or more instruments. Must be excluded from the structural equation.

required
covariates list of str

Exogenous controls entering both stages.

None
per_instrument bool

When True and multiple instruments are supplied, additionally fits one just-identified Bayesian IV sub-model per instrument and populates :attr:BayesianIVResult.instrument_summaries, letting tidy(terms='per_instrument') emit one LATE row per Z_j. The top-level pooled LATE posterior remains the joint over-identified fit (v0.9.15 behaviour). Each per-instrument sub-fit reuses the same priors and sampler controls as the pooled fit (draws/tune/chains/target_accept/random_state), so runtime scales roughly as (K+1)× the pooled fit.

``False``
prior_late (float, float)

Normal prior on the structural LATE coefficient.

(0.0, 10.0)
prior_first_stage_sigma float

Priors for first-stage coefficients, structural coefficients, and the two residual scales.

5.0
prior_coef_sigma float

Priors for first-stage coefficients, structural coefficients, and the two residual scales.

5.0
prior_noise float

Priors for first-stage coefficients, structural coefficients, and the two residual scales.

5.0
rope (float, float)

Region of practical equivalence.

None
hdi_prob float

Sampler controls — see :func:bayes_did.

0.95
draws float

Sampler controls — see :func:bayes_did.

0.95
tune float

Sampler controls — see :func:bayes_did.

0.95
chains float

Sampler controls — see :func:bayes_did.

0.95
target_accept float

Sampler controls — see :func:bayes_did.

0.95
random_state float

Sampler controls — see :func:bayes_did.

0.95
progressbar float

Sampler controls — see :func:bayes_did.

0.95

Returns:

Type Description
BayesianCausalResult

Posterior summary on the LATE coefficient.

bayes_fuzzy_rd

bayes_fuzzy_rd(data: DataFrame, y: str, treat: str, running: str, cutoff: float = 0.0, bandwidth: Optional[float] = None, poly: int = 1, *, prior_late: Tuple[float, float] = (0.0, 10.0), regularize_late: bool = False, prior_slope_sigma: float = 10.0, prior_noise: float = 5.0, rope: Optional[Tuple[float, float]] = None, hdi_prob: float = 0.95, inference: str = 'nuts', advi_iterations: int = 20000, draws: int = 2000, tune: int = 1000, chains: int = 4, target_accept: float = 0.9, random_state: int = 42, progressbar: bool = False) -> BayesianCausalResult

Bayesian fuzzy RD — LATE at the cutoff via joint ITT-Y / ITT-D posterior.

Models within-bandwidth observations:

.. code-block:: text

Y_i = a_Y + itt_Y * I(x_i >= c) + poly_Y(x_i - c) + eps_Y
D_i = a_D + itt_D * I(x_i >= c) + poly_D(x_i - c) + eps_D
LATE := itt_Y / itt_D   (deterministic posterior)

Each polynomial gets independent slopes on each side (standard local-linear RD practice). Priors on the ITT jumps are Normal(mean, prior_slope_sigma) (weakly informative); LATE inherits its prior from prior_late but the ratio form means the posterior is data-driven once compliance is non-trivial.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome column.

required
treat str

Realised treatment uptake (0/1). May disagree with running >= cutoff when compliance is partial.

required
running str

Running variable.

required
cutoff float
0.0
bandwidth float

Default: 0.5 * std(running).

None
poly int
1
prior_late (float, float)

Mean / SD of a regularising Normal prior on the LATE. Only applied when regularize_late=True.

(0.0, 10.0)
regularize_late bool

If True, add an explicit Normal(mu_late, sigma_late) soft-prior on the deterministic ratio itt_Y / itt_D. OFF by default — the default posterior is driven purely by the data and the priors on itt_Y / itt_D. Turn on if you observe ratio explosions under weak compliance; note that the resulting posterior will be biased toward mu_late when the data is only weakly informative (stacks on top of the implicit prior through the constituents).

``False``
prior_slope_sigma float
10.0
prior_noise float
10.0
rope Optional[Tuple[float, float]]
None
hdi_prob Optional[Tuple[float, float]]
None
draws Optional[Tuple[float, float]]
None
tune Optional[Tuple[float, float]]
None
chains Optional[Tuple[float, float]]
None
target_accept Optional[Tuple[float, float]]
None
random_state Optional[Tuple[float, float]]
None
progressbar see :func:`bayes_did`.
False

Returns:

Type Description
BayesianCausalResult

Posterior over the LATE.

bayes_hte_iv

bayes_hte_iv(data: DataFrame, y: str, treat: str, instrument: Union[str, Sequence[str]], effect_modifiers: Sequence[str], covariates: Optional[List[str]] = None, *, prior_late: Tuple[float, float] = (0.0, 10.0), prior_hte_sigma: float = 5.0, prior_coef_sigma: float = 10.0, prior_noise: float = 5.0, inference: str = 'nuts', advi_iterations: int = 20000, rope: Optional[Tuple[float, float]] = None, hdi_prob: float = 0.95, draws: int = 2000, tune: int = 1000, chains: int = 4, target_accept: float = 0.9, random_state: int = 42, progressbar: bool = False) -> BayesianHTEIVResult

Bayesian IV with linear CATE-by-covariate heterogeneity.

Parameters:

Name Type Description Default
data DataFrame

Same semantics as :func:bayes_iv.

required
y DataFrame

Same semantics as :func:bayes_iv.

required
treat DataFrame

Same semantics as :func:bayes_iv.

required
instrument DataFrame

Same semantics as :func:bayes_iv.

required
covariates DataFrame

Same semantics as :func:bayes_iv.

required
effect_modifiers sequence of str

Covariates whose linear interaction with treat is the heterogeneity signal. The model centres these at their sample mean so tau_0 is interpretable as the average LATE at modifiers = mean.

required
prior_hte_sigma float

Prior SD on each element of tau_hte (the slope of LATE on the corresponding modifier).

5.0
inference ('nuts', 'advi')

Sampler. ADVI is a fast mean-field approximation; see the module-level caveat.

'nuts'
advi_iterations int

ADVI iterations (ignored for NUTS).

20000
prior_late Tuple[float, float]
(0.0, 10.0)
prior_coef_sigma Tuple[float, float]
(0.0, 10.0)
prior_noise Tuple[float, float]
(0.0, 10.0)
rope Tuple[float, float]
(0.0, 10.0)
hdi_prob Tuple[float, float]
(0.0, 10.0)
draws Tuple[float, float]
(0.0, 10.0)
tune see

:func:bayes_did.

1000
chains see

:func:bayes_did.

1000
target_accept see

:func:bayes_did.

1000
random_state see

:func:bayes_did.

1000
progressbar see

:func:bayes_did.

1000

Returns:

Type Description
BayesianHTEIVResult

BayesianCausalResult with a cate_slopes DataFrame and predict_cate(values) method.

bayes_mte

bayes_mte(data: DataFrame, y: str, treat: str, instrument: Union[str, Sequence[str]], covariates: Optional[List[str]] = None, *, first_stage: str = 'plugin', mte_method: str = 'polynomial', selection: str = 'uniform', u_grid: Optional[ndarray] = None, poly_u: Optional[int] = None, prior_coef_sigma: float = 10.0, prior_mte_sigma: float = 5.0, prior_noise: float = 5.0, rope: Optional[Tuple[float, float]] = None, hdi_prob: float = 0.95, inference: str = 'nuts', advi_iterations: int = 20000, draws: int = 2000, tune: int = 1000, chains: int = 4, target_accept: float = 0.9, random_state: int = 42, progressbar: bool = False) -> BayesianMTEResult

Bayesian Marginal Treatment Effects via plug-in propensity + polynomial MTE.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome, endogenous binary treatment, scalar instrument.

required
treat str

Outcome, endogenous binary treatment, scalar instrument.

required
instrument str

Outcome, endogenous binary treatment, scalar instrument.

required
covariates list of str

Exogenous controls entering both stages.

None
first_stage ('plugin', 'joint')

Selection-equation strategy.

  • 'plugin' : frequentist logit MLE on (Z, X) -> D computed once; propensities enter the MTE polynomial as fixed constants. Fast and the v0.9.8 behaviour.
  • 'joint' : first-stage logit coefficients live inside the PyMC graph with Normal priors and D ~ Bernoulli(p); the propensity p_i is a Deterministic and the MTE polynomial sees it directly, so first-stage uncertainty propagates into the MTE curve. 2-4× slower than plugin but honest about uncertainty.
'plugin'
mte_method ('polynomial', 'hv_latent', 'bivariate_normal')

MTE parameterisation.

  • 'polynomial' : fit a polynomial in the propensity p_i (v0.9.9 behaviour). Under Heckman-Vytlacil 2005 linear-separable + bivariate-normal errors this equals MTE(p); under arbitrary heterogeneity it is LATE-at-propensity g(p), NOT the textbook MTE(u) = E[Y_1 - Y_0 | U_D = u].
  • 'hv_latent' : sample a latent U_D_i per unit from the HV-correct truncated uniform (via raw_U_i ~ U(0,1) and a deterministic reparameterisation), then evaluate the polynomial at U_D_i. This recovers the textbook MTE polynomial under linear-separable HV. Slower (adds shape-n latent; O(n·draws·chains) memory) but mathematically faithful. A UserWarning is emitted if the expected latent storage exceeds ~50M floats.
  • 'bivariate_normal' : full textbook Heckman-Vytlacil trivariate-normal model (U_0, U_1, V) ~ N(0, Σ) with D = 1{Z'π > V}. Identifies the structural intercept β_D = μ_1 - μ_0 and the two error covariances σ_0V, σ_1V, so MTE(v) = β_D + (σ_1V - σ_0V)·v is closed-form linear (no user polynomial choice). Requires selection='normal' (V-scale is the natural fit scale) and first_stage='joint' (to identify V). The structural equation gets inverse-Mills-ratio correction terms -D·σ_1V·λ_1(p) + (1-D)·σ_0V·λ_0(p) with λ_1(p) = φ(Φ^{-1}(p))/p and λ_0(p) = φ(Φ^{-1}(p))/(1-p). poly_u is ignored (the model is inherently linear in V). Exposes b_mte as a 2-vector Deterministic [β_D, σ_1V - σ_0V] so downstream mte_curve, ATT/ATU and policy_effect code paths work without change.
'polynomial'
u_grid ndarray

Grid of propensity-to-be-treated values on which to evaluate the MTE posterior. Default: np.linspace(0.05, 0.95, 19).

None
poly_u int

Polynomial order of the MTE in U_D. poly_u=0 reduces to a constant treatment effect; poly_u=2 captures U-shaped or inverted-U selection on gains. Default resolves per mte_method: 2 for 'polynomial' / 'hv_latent', and 1 for 'bivariate_normal' (the model is inherently linear in V there and ignores any other value). Passing an explicit non-1 value with mte_method='bivariate_normal' emits a UserWarning before the override.

None
prior_mte_sigma float

SD on each MTE polynomial coefficient (Normal prior).

5.0
prior_coef_sigma float

Priors on structural intercept + covariate slopes + residual SD.

10.0
prior_noise float

Priors on structural intercept + covariate slopes + residual SD.

10.0
rope Optional[Tuple[float, float]]
None
hdi_prob Optional[Tuple[float, float]]
None
inference Optional[Tuple[float, float]]
None
advi_iterations Optional[Tuple[float, float]]
None
draws Optional[Tuple[float, float]]
None
tune Optional[Tuple[float, float]]
None
chains Optional[Tuple[float, float]]
None
target_accept float

See :func:bayes_did.

0.9
random_state float

See :func:bayes_did.

0.9
progressbar float

See :func:bayes_did.

0.9

Returns:

Type Description
BayesianMTEResult

With .mte_curve (DataFrame on u_grid), .ate, .att, .atu, and .plot_mte(). The inherited :class:BayesianCausalResult summary fields carry the integrated average MTE (ATE).

bayes_dml

bayes_dml(data: DataFrame, *, y: str, treatment: str, covariates: Sequence[str], model: str = 'plr', prior_mean: float = 0.0, prior_sd: float = 10.0, mode: str = 'conjugate', alpha: float = 0.05, n_folds: int = 5, random_state: int = 42, n_samples: int = 2000, ml_g: Optional[Any] = None, ml_m: Optional[Any] = None) -> BayesianDMLResult

Bayesian Double Machine Learning estimator.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
treatment str
required
covariates sequence of str
required
model str

Forwarded to :func:sp.dml'plr' / 'irm' / 'pliv'.

'plr'
prior_mean float

Parameters of the Normal prior on the treatment effect.

0.0
prior_sd float

Parameters of the Normal prior on the treatment effect.

0.0
mode ('conjugate', 'full')

Conjugate uses closed-form Normal-Normal updating on the DML point/SE. Full mode requires PyMC ([bayes] extras) and samples the full posterior over the orthogonal moment equation.

'conjugate'
alpha float
0.05
n_folds int
5
random_state int
42
n_samples int

Posterior draws (used only for mode='full').

2000
ml_g sklearn estimators

Custom nuisance learners forwarded to :func:sp.dml.

None
ml_m sklearn estimators

Custom nuisance learners forwarded to :func:sp.dml.

None

Returns:

Type Description
BayesianDMLResult
Notes

Conjugate posterior — for Normal(mu_0, sigma_0²) prior and a Gaussian likelihood with variance sigma²:

posterior_precision = 1/sigma_0² + 1/sigma²
posterior_mean = (mu_0/sigma_0² + dml_point/sigma²) / posterior_precision
posterior_sd   = 1 / sqrt(posterior_precision)

With a weakly-informative prior (sigma_0 large) the posterior mean collapses to the DML point estimate and the credible interval is (approximately) the DML 95% CI.

References

DiTraglia, F. J., & Liu, L. (2025). "Bayesian Double Machine Learning for Causal Inference." arXiv:2508.12688. (The underlying orthogonal-moments DML construction is due to Chernozhukov et al. 2018, Econometrics Journal.) [@ditraglia2025bayesian]

policy_weight_ate

policy_weight_ate() -> Callable[[ndarray], ndarray]

Uniform weight = 1 on every grid point.

Equivalent to calling policy_effect and getting back the ATE from the current grid; provided for API parity.

policy_weight_subsidy

policy_weight_subsidy(u_lo: float, u_hi: float) -> Callable[[ndarray], ndarray]

Weight = 1 on [u_lo, u_hi]; 0 elsewhere.

Use when the counterfactual policy is "subsidise treatment for units whose propensity-to-treat lies in this band". The returned weights reach the compliers induced by the subsidy.

Parameters:

Name Type Description Default
u_lo float

Band endpoints on the propensity (U_D) scale. Both must lie in [0, 1] and u_lo < u_hi.

required
u_hi float

Band endpoints on the propensity (U_D) scale. Both must lie in [0, 1] and u_lo < u_hi.

required

policy_weight_prte

policy_weight_prte(shift: float) -> Callable[[ndarray], ndarray]

Stylised PRTE weights — convenience builder, NOT the textbook Carneiro-Heckman-Vytlacil (2011) PRTE.

The textbook CHV 2011 PRTE under a propensity-scale policy shift Δ is

.. code-block:: text

w(u) ∝ [F_P(u) - F_{P+Δ}(u)] / Δ

which depends on the observed propensity distribution F_P of the sample — it is NOT a rectangle and cannot be specified from shift alone. Implementing the true PRTE therefore requires the user to pass their sample's propensity draws, which is deliberately out of scope for a one-arg builder.

What this function returns instead is a convenience rectangle around the mean propensity 0.5:

.. code-block:: text

weight(u) = 1 if u ∈ [0.5 - |shift|/2, 0.5 + |shift|/2]
            0 otherwise

This approximates the "units near the decision margin under a uniform index shift" narrative and is frequently good enough for agent-native exploration. If you need the exact CHV PRTE for a specific F_P, build a bespoke weight_fn with the observed propensity kernel and pass it to :meth:BayesianMTEResult.policy_effect directly — e.g.

.. code-block:: python

from scipy.stats import gaussian_kde
fp = gaussian_kde(propensity_sample)
def chv_prte(u, delta=0.1):
    return (fp(u) - fp(u - delta)) / delta
r.policy_effect(chv_prte, label='chv_prte')

Parameters:

Name Type Description Default
shift float

Size of the propensity-scale shift. Must be in (-1, 1) and non-zero.

required

policy_weight_marginal

policy_weight_marginal(u_star: float, bandwidth: float = 0.05) -> Callable[[ndarray], ndarray]

Marginal PRTE at a specific propensity level u_star.

Approximates the derivative of the policy effect at u_star by averaging MTE over a narrow band of half-width bandwidth. Useful for sanity-checking selection-on-gains at a specific decision margin.

Parameters:

Name Type Description Default
u_star float

Target propensity level, in [0, 1].

required
bandwidth float

Half-width of the averaging window.

0.05

policy_weight_observed_prte

policy_weight_observed_prte(propensity_sample: ndarray, shift: float, *, bw_method=None) -> Callable[[ndarray], ndarray]

True CHV-2011 PRTE weights from the observed propensity distribution via Gaussian KDE.

Implements the policy-relevant treatment effect weighting (Carneiro-Heckman-Vytlacil 2011, Theorem 1):

.. code-block:: text

w(u) ∝ [F_P(u) - F_{P + Δ}(u)] / Δ
     = [F_P(u) - F_P(u - Δ)] / Δ
     = ∫_{u - Δ}^{u} f_P(s) ds / Δ

where F_P is the CDF of the observed propensity sample, f_P is its density (Gaussian KDE here), and Δ = shift is the scalar propensity-scale policy shift. Intuition: w(u) is the population density of policy-induced compliers at propensity level u.

Compared to :func:policy_weight_prte (a stylised rectangle), this uses the actual sample distribution of propensity, which is what CHV 2011 describes as the correct weighting kernel.

Parameters:

Name Type Description Default
propensity_sample ndarray

1-D array of observed propensity scores in [0, 1]. Typical source: sp.bayes_mte(...).model_info['propensity'] or a direct logit fit on your sample.

required
shift float

Policy-scale propensity shift. Must be non-zero and in (-1, 1). Positive = expand treatment uptake by shift at every propensity level; negative = contraction.

required
bw_method str | float | callable | None

Passed to :class:scipy.stats.gaussian_kde. None uses Scott's rule, which is a good default for smooth propensity densities.

``None``

Returns:

Type Description
callable

A function weight_fn(u: np.ndarray) -> np.ndarray that returns the (non-negative) PRTE weight at each grid point. Negative-density-difference values are clipped at 0 — the integral ∫ w(u) MTE(u) du is ill-defined for negative weights and clipping aligns with the CHV-2011 interpretation when the kernel returns slightly negative tail values due to the shift placing density outside [0, 1].

References

Carneiro, P., Heckman, J. J., & Vytlacil, E. J. (2011). Estimating marginal returns to education. AER, 101(6), 2754-2781. [@carneiro2011estimating]