Skip to content

statspai.rd

rd

Regression Discontinuity (RD) module for StatsPAI.

Regression-discontinuity tools exposed through the StatsPAI API:

Core estimation: - Sharp, Fuzzy, and Kink RD estimation with robust bias-corrected inference (CCT 2014) - Covariate-adjusted local polynomial estimation (Calonico et al. 2019) - Donut-hole RD for manipulation near cutoff - Regression Discontinuity in Time (Hausman-Rapson 2018) - Multi-cutoff and multi-score RD (Cattaneo et al. 2024) - Boundary discontinuity / 2D RD designs (Cattaneo-Titiunik-Yu 2025)

Bandwidth selection: - MSE-optimal: mserd, msetwo, msecomb1, msecomb2 - CER-optimal: cerrd, certwo, cercomb1, cercomb2 (Calonico-Cattaneo-Farrell 2020) - Fuzzy-specific and covariate-adjusted bandwidth selection

Inference: - Honest confidence intervals (Armstrong-Kolesar 2018, 2020) - Local randomization inference with Fisher exact tests (Cattaneo-Titiunik-VB 2016) - Window selection and sensitivity analysis - Rosenbaum sensitivity bounds

Treatment effect heterogeneity: - CATE estimation via fully interacted local linear (Calonico et al. 2025) - ML + RD: Causal Forest, Gradient Boosting, LASSO-assisted RD

External validity: - Extrapolation away from cutoff (Angrist-Rokkanen 2015) - Multi-cutoff extrapolation (Cattaneo et al. 2024) - External validity diagnostics

Diagnostics & visualization: - Diagnostic dashboard (rdsummary) - IMSE-optimal binned scatter with pointwise CI bands - Density manipulation testing (CJM 2020) - Bandwidth sensitivity, covariate balance, placebo cutoff tests - Power analysis and sample size calculations

RDMultiResult

Results from multi-cutoff/multi-score RD.

plot

plot(ax=None, **kwargs)

Forest plot of cutoff-specific and pooled estimates.

RDInterferenceResult dataclass

Direct + spillover RDD effects under network interference.

MultiScoreRDResult dataclass

Multi-score RDD effect on the boundary.

DistRDResult dataclass

RDD effect on each quantile.

BayesRDHTEResult dataclass

Bayesian RDD with HTE posterior summaries.

DDDResult dataclass

Distributional discontinuity design output.

rdplot

rdplot(data: DataFrame, y: str, x: str, c: float = 0, nbins: Optional[int] = None, binselect: str = 'esmv', p: int = 4, kernel: str = 'triangular', ci_level: float = 0.95, shade_ci: bool = True, donut: float = 0, show_bw: bool = False, h: Optional[float] = None, covs: Optional[List[str]] = None, weights: Optional[str] = None, hide_ci: bool = False, scatter: bool = True, ax=None, figsize: tuple = (10, 7), title: Optional[str] = None, x_label: Optional[str] = None, y_label: Optional[str] = None)

RD plot: binned scatter with polynomial fit on each side of the cutoff.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome and running variable names.

required
x str

Outcome and running variable names.

required
c float

Cutoff.

0
nbins int

Bins per side. If None, uses data-driven selection via binselect.

None
binselect str

Bin selection method (when nbins=None): - 'es' : IMSE-optimal evenly spaced - 'espr' : IMSE-optimal evenly spaced (mimicking variance) - 'qs' : IMSE-optimal quantile-spaced - 'qspr' : IMSE-optimal quantile-spaced (mimicking variance) - 'esmv' : IMSE-optimal evenly spaced with variance mimicking (default) - 'qsmv' : IMSE-optimal quantile-spaced with variance mimicking

'esmv'
p int

Polynomial order for the fitted curve.

4
kernel str

Kernel for the fitted curve.

'triangular'
ci_level float

Confidence level for pointwise CI bands.

0.95
shade_ci bool

Show confidence interval bands around the polynomial fit.

True
donut float

If > 0, shades the donut region |x - c| <= donut.

0
show_bw bool

If True, shades the bandwidth window.

False
h float

Bandwidth to display.

None
covs list of str

Covariates to partial out before binning and plotting.

None
weights str

Column name for observation weights in polynomial fitting.

None
hide_ci bool

If True, suppress CI bands entirely.

False
scatter bool

Show binned scatter points.

True
ax matplotlib Axes
None
figsize tuple
(10, 7)
title str
None
x_label str
None
y_label str
None

Returns:

Type Description
(fig, ax)

rdplotdensity

rdplotdensity(data: DataFrame, x: str, c: float = 0, p: int = 2, n_grid: int = 50, h: Optional[float] = None, ci_level: float = 0.95, hist: bool = True, nbins: int = 30, ax=None, figsize: tuple = (10, 7), title: Optional[str] = None)

Boundary-adaptive density discontinuity plot at the RD cutoff.

Implements the local-polynomial density estimator of Cattaneo, Jansson & Ma (2020, Journal of the American Statistical Association 115(531), 1449-1455): for each side of the cutoff the empirical CDF F̂ is constructed and a local polynomial of order p is fit to F̂. The slope at the cutoff equals the density f̂(c±), and the asymptotic variance comes from the same local-polynomial sandwich. Boundary-adaptive — does not require a separate boundary kernel.

Parameters:

Name Type Description Default
data DataFrame
required
x str

Running variable.

required
c float

Cutoff.

0
p int

Polynomial order for the CDF regression (p=2 recommended; p=1 is faster but with worse boundary behavior).

2
n_grid int

Grid points per side for the density curve.

50
h float

Bandwidth. If None, side-specific Silverman pilot.

None
ci_level float

Confidence level for CI bands.

0.95
hist bool

Overlay histogram.

True
nbins int

Number of histogram bins per side.

30
ax matplotlib Axes
None
figsize tuple
(10, 7)
title str
None

Returns:

Type Description
(fig, ax)
References

Cattaneo, M.D., Jansson, M. and Ma, X. (2020). "Simple Local Polynomial Density Estimators." JASA 115(531), 1449-1455. [@cattaneo2020simple]

rdbwselect

rdbwselect(data: DataFrame, y: str, x: str, c: float = 0, fuzzy: Optional[str] = None, deriv: int = 0, p: int = 1, q: Optional[int] = None, covs: Optional[List[str]] = None, kernel: str = 'triangular', bwselect: str = 'mserd', cluster: Optional[str] = None, all: bool = False) -> DataFrame

Bandwidth selection for local polynomial RD estimation.

Implements all eight MSE-optimal and CER-optimal bandwidth selection procedures from Calonico, Cattaneo, and Farrell (2020). MSE-optimal bandwidths minimize the mean squared error of the RD point estimator, while CER-optimal bandwidths minimize the coverage error rate of robust bias-corrected confidence intervals.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable column name.

required
x str

Running variable column name.

required
c float

RD cutoff value.

0
fuzzy str

Treatment variable name for fuzzy RD. When provided, bandwidth accounts for first-stage variance in the Wald / IV estimator.

None
deriv int

Derivative order. 0 = standard RD (jump in level), 1 = regression kink design (change in slope).

0
p int

Polynomial order for point estimation (1 = local linear).

1
q int

Polynomial order for bias correction. Default is p + 1.

None
covs list of str

Covariate column names. When provided, the variance estimates used in bandwidth selection account for covariate adjustment, typically yielding narrower bandwidths.

None
kernel str

Kernel function: 'triangular', 'uniform', or 'epanechnikov'.

'triangular'
bwselect str

Bandwidth selection method. One of:

  • 'mserd' : MSE-optimal common bandwidth (default)
  • 'msetwo' : MSE-optimal separate left/right bandwidths
  • 'msecomb1' : min of mserd, mseleft, mseright
  • 'msecomb2' : median of mserd, mseleft, mseright
  • 'cerrd' : CER-optimal common bandwidth
  • 'certwo' : CER-optimal separate left/right
  • 'cercomb1' : min of cerrd, cerleft, cerright
  • 'cercomb2' : median of cerrd, cerleft, cerright
'mserd'
cluster str

Cluster variable name for cluster-robust variance estimation.

None
all bool

If True, compute and return all eight bandwidth types.

False

Returns:

Type Description
DataFrame

DataFrame with columns [method, h_left, h_right, b_left, b_right, n_left, n_right]. When all=False, contains a single row for the selected method. When all=True, contains eight rows for all methods.

Notes

The CER-optimal bandwidth is related to the MSE-optimal bandwidth by:

h_CER = h_MSE * n^{-1/((2p+3)(2p+5))}

For p=1 (local linear), this gives h_CER ~ h_MSE * n^{-1/35}, which is strictly narrower than h_MSE. The narrower bandwidth yields confidence intervals with better coverage properties at the cost of slightly wider intervals.

The MSE-optimal bandwidth has rate n^{-1/(2p+3)} while the CER rate is n^{-1/(2p+3) - 1/((2p+3)(2p+5))}. For large samples the difference is meaningful: CER bandwidths produce robust CIs that achieve their nominal coverage rate, whereas MSE bandwidths can exhibit substantial coverage distortion.

References

Calonico, S., Cattaneo, M.D. and Farrell, M.H. (2020). "Optimal Bandwidth Choice for Robust Bias-Corrected Inference in Regression Discontinuity Designs." Econometrics Journal, 23(2), 192-210. [@calonico2020optimal]

Calonico, S., Cattaneo, M.D. and Titiunik, R. (2014). "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs." Econometrica, 82(6), 2295-2326. [@calonico2014robust]

Examples:

Basic MSE-optimal bandwidth:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> X = rng.uniform(-1, 1, n)
>>> Y = 0.5 * X + 3.0 * (X >= 0) + rng.normal(0, 0.3, n)
>>> df = pd.DataFrame({'y': Y, 'x': X})
>>> bw = sp.rdbwselect(df, y='outcome', x='score', c=0)
>>> print(bw)  # DataFrame with bandwidth info

Compare all eight bandwidth methods:

>>> bw_all = sp.rdbwselect(df, y='outcome', x='score', c=0, all=True)
>>> print(bw_all)  # All 8 bandwidth methods compared

CER-optimal bandwidth for better coverage:

>>> bw_cer = sp.rdbwselect(df, y='outcome', x='score', c=0,
...                         bwselect='cerrd')

Fuzzy RD with covariates:

>>> bw_fuzzy = sp.rdbwselect(df, y='outcome', x='score', c=0,
...                           fuzzy='treated', covs=['age', 'gender'])

rdbwsensitivity

rdbwsensitivity(data: DataFrame, y: str, x: str, c: float = 0, fuzzy: Optional[str] = None, p: int = 1, kernel: str = 'triangular', bw_grid: Optional[List[float]] = None, n_grid: int = 15, bw_range: tuple = (0.5, 2.0), alpha: float = 0.05, ax=None, figsize: tuple = (10, 6)) -> DataFrame

Bandwidth sensitivity analysis for RD estimates.

Re-estimates the RD effect across a grid of bandwidths to assess robustness. Plots point estimates with confidence intervals.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome and running variable names.

required
x str

Outcome and running variable names.

required
c float

Cutoff.

0
fuzzy str

Treatment variable for fuzzy RD.

None
p int

Polynomial order.

1
kernel str
'triangular'
bw_grid list of float

Explicit bandwidth values to evaluate. If None, auto-generates a grid as multiples of the MSE-optimal bandwidth.

None
n_grid int

Number of grid points if bw_grid is None.

15
bw_range tuple

Range of multipliers for the optimal bandwidth.

(0.5, 2.0)
alpha float
0.05
ax matplotlib Axes
None
figsize tuple
(10, 6)

Returns:

Type Description
DataFrame

Columns: bandwidth, estimate, se, ci_lower, ci_upper, pvalue.

rdbalance

rdbalance(data: DataFrame, x: str, c: float = 0, covs: Optional[List[str]] = None, p: int = 1, kernel: str = 'triangular', h: Optional[float] = None, alpha: float = 0.05) -> DataFrame

Covariate balance test at the RD cutoff.

For each covariate, estimates the discontinuity at the cutoff using local polynomial regression. Pre-treatment covariates should show no significant jump at the cutoff if the RD design is valid.

Parameters:

Name Type Description Default
data DataFrame
required
x str

Running variable name.

required
c float

Cutoff.

0
covs list of str

Covariate names to test. If None, tests all numeric columns except x.

None
p int
1
kernel str
'triangular'
h float

Manual bandwidth. If None, uses MSE-optimal per covariate.

None
alpha float
0.05

Returns:

Type Description
DataFrame

Columns: covariate, estimate, se, z, pvalue, significant.

rdplacebo

rdplacebo(data: DataFrame, y: str, x: str, c: float = 0, placebo_cutoffs: Optional[List[float]] = None, n_placebo: int = 10, side: str = 'both', fuzzy: Optional[str] = None, p: int = 1, kernel: str = 'triangular', alpha: float = 0.05, ax=None, figsize: tuple = (10, 6)) -> DataFrame

Placebo cutoff test for RD validity.

Estimates the RD effect at fake cutoff points where no treatment effect should exist. Significant effects at placebo cutoffs suggest the RD design may be invalid.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome and running variable names.

required
x str

Outcome and running variable names.

required
c float

True cutoff.

0
placebo_cutoffs list of float

Explicit placebo cutoff values. If None, auto-generates from the data distribution.

None
n_placebo int

Number of placebo cutoffs if auto-generating.

10
side str

Which side of the true cutoff to place placebos: 'left', 'right', or 'both'.

'both'
fuzzy str

Treatment variable for fuzzy RD.

None
p int
1
kernel str
'triangular'
alpha float
0.05
ax matplotlib Axes
None
figsize tuple
(10, 6)

Returns:

Type Description
DataFrame

Columns: cutoff, estimate, se, ci_lower, ci_upper, pvalue, is_true_cutoff.

rdsummary

rdsummary(data: DataFrame, y: str, x: str, c: float = 0, fuzzy: Optional[str] = None, covs: Optional[List[str]] = None, p: int = 1, kernel: str = 'triangular', alpha: float = 0.05, verbose: bool = True, plot: bool = False, full: bool = False) -> Dict[str, Any]

One-stop RD diagnostic battery.

Runs the main RD estimate plus all standard validation checks in a single call, returning a structured summary.

Diagnostics run (standard): 1. Main RD estimate (conventional + robust) 2. Density manipulation test (CJM 2020) 3. Covariate balance at cutoff (if covs provided) 4. Bandwidth sensitivity

Extended diagnostics (full=True): 5. Honest CI (Armstrong-Kolesár 2020) 6. Power analysis (MDE at 80%) 7. Placebo cutoff tests (5 placebos per side) 8. Multiple bandwidth selectors comparison

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome and running variable.

required
x str

Outcome and running variable.

required
c float

Cutoff.

0
fuzzy str

Treatment variable for fuzzy RD.

None
covs list of str

Pre-treatment covariates for balance test.

None
p int
1
kernel str
'triangular'
alpha float
0.05
verbose bool

Print formatted summary to console.

True
plot bool

Generate a multi-panel diagnostic plot.

False
full bool

Run extended diagnostics (honest CI, power, placebos).

False

Returns:

Type Description
dict with keys:

'estimate': CausalResult from rdrobust 'density_test': CausalResult from rddensity 'balance': pd.DataFrame (if covs given) 'bw_sensitivity': pd.DataFrame 'honest_ci': CausalResult (if full=True) 'power': RDPowerResult (if full=True) 'placebos': pd.DataFrame (if full=True) 'bandwidth_comparison': pd.DataFrame (if full=True) 'figure': matplotlib Figure (if plot=True)

rd_honest

rd_honest(data: DataFrame, y: str, x: str, c: float = 0.0, M: Optional[float] = None, kernel: str = 'triangular', h: Optional[float] = None, alpha: float = 0.05, opt_criterion: str = 'mse') -> CausalResult

Honest confidence intervals for regression discontinuity designs.

Implements Armstrong & Kolesár (2018, 2020): CIs that are valid uniformly over the class of regression functions whose second derivative is bounded by M. These "honest" CIs account for the smoothing bias that standard local-polynomial CIs ignore, yielding correct coverage even in finite samples.

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
c float

RD cutoff.

0
M float

Upper bound on |f''(c)|. If None, estimated from a local quadratic fit on each side of the cutoff.

None
kernel str

Kernel function: "triangular", "epanechnikov", or "uniform".

"triangular"
h float

Bandwidth. If None, chosen by the criterion in opt_criterion.

None
alpha float

Significance level for the confidence interval.

0.05
opt_criterion str

Bandwidth selection criterion when h is None: "mse" (MSE-optimal, Imbens-Kalyanaraman style) or "flci" (minimises honest CI length).

"mse"

Returns:

Type Description
CausalResult

Result object with model_info containing: - honest_ci : tuple – honest confidence interval - naive_ci : tuple – standard CI for comparison - M : float – smoothness bound used - bias_bound: float – estimated worst-case bias - bandwidth : float – bandwidth used

rdmc

rdmc(data: DataFrame, y: str, x: str, cutoffs: List[float], bandwidth: float = None, kernel: str = 'triangular', pooling: str = 'ivw', alpha: float = 0.05) -> RDMultiResult

Multi-cutoff RD design.

Estimates treatment effects at multiple cutoffs and pools them.

Equivalent to rdmulti::rdmc() in R/Stata.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
x str

Running variable.

required
cutoffs list of float

Cutoff values.

required
bandwidth float

Bandwidth for local polynomial. If None, uses Silverman rule.

None
kernel str
'triangular'
pooling str

Pooling method: 'ivw' (inverse-variance weighted) or 'equal'.

'ivw'
alpha float
0.05

Returns:

Type Description
RDMultiResult

Examples:

>>> import statspai as sp
>>> result = sp.rdmc(df, y='score', x='running_var', cutoffs=[50, 70, 90])
>>> print(result.summary())
>>> result.plot()

rdms

rdms(data: DataFrame, y: str, x1: str, x2: str, cutoff1: float = 0, cutoff2: float = 0, bandwidth: float = None, kernel: str = 'triangular', alpha: float = 0.05) -> CausalResult

Multi-score / Geographic RD design.

Handles two-dimensional running variables for geographic boundaries.

Equivalent to rdmulti::rdms() and Keele & Titiunik (2015).

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
x1 str

First running variable (e.g., latitude distance to boundary).

required
x2 str

Second running variable (e.g., longitude distance to boundary).

required
cutoff1 float

Cutoff for x1.

0
cutoff2 float

Cutoff for x2.

0
bandwidth float
None
kernel str
'triangular'
alpha float
0.05

Returns:

Type Description
CausalResult

Examples:

>>> import statspai as sp
>>> result = sp.rdms(df, y='outcome', x1='dist_lat', x2='dist_lon')
>>> print(result.summary())

rdsampsi

rdsampsi(tau: float, var_left: float = 1.0, var_right: float = 1.0, h_left: float = 0.5, h_right: float = 0.5, alpha: float = 0.05, target_power: float = 0.8, ratio: float = 1.0) -> RDSampSiResult

Minimum sample size for a given power in an RD design.

Parameters:

Name Type Description Default
ratio float

n_right / n_left. Default 1.0 assumes equal allocation.

1.0

rdrandinf

rdrandinf(data: DataFrame, y: str, x: str, c: float = 0, wl: Optional[float] = None, wr: Optional[float] = None, statistic: str = 'diffmeans', p: int = 0, covs: Optional[List[str]] = None, kernel: str = 'uniform', n_perms: int = 1000, fuzzy: Optional[str] = None, alpha: float = 0.05, seed: int = 42) -> CausalResult

Randomization inference for regression discontinuity designs.

Under the local randomization assumption, units within a small window around the cutoff are treated as if randomly assigned. Inference is based on Fisher's randomization test.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
c float

RD cutoff value.

0
wl float

Window left bound offset from cutoff (typically negative). The left edge of the window is c + wl.

None
wr float

Window right bound offset from cutoff (typically positive). The right edge of the window is c + wr.

None
statistic str

Test statistic: 'diffmeans', 'ksmirnov', 'ranksum', or 'all'.

'diffmeans'
p int

Polynomial order for adjustment (0 = unadjusted).

0
covs list of str

Covariate names to partial out before testing.

None
kernel str

Kernel weighting (only 'uniform' currently supported for local randomization).

'uniform'
n_perms int

Number of permutations for Fisher randomization test.

1000
fuzzy str

Actual treatment variable for fuzzy RD. The Wald (IV) estimator is computed within the window.

None
alpha float

Significance level.

0.05
seed int

Random seed for reproducibility.

42

Returns:

Type Description
CausalResult

Result with treatment effect estimate, permutation and asymptotic p-values, and confidence interval.

Notes

The confidence interval is obtained by test inversion when statistic='diffmeans': the set of hypothesised effect values tau_0 that are not rejected by the permutation test at level alpha.

References

Cattaneo, M.D., Titiunik, R. and Vazquez-Bare, G. (2016). "Inference in Regression Discontinuity Designs under Local Randomization." The Stata Journal, 16(2), 331-367. [@cattaneo2016inference]

rdwinselect

rdwinselect(data: DataFrame, x: str, c: float = 0, covs: Optional[List[str]] = None, wmin: Optional[float] = None, wstep: Optional[float] = None, nwindows: int = 10, statistic: str = 'diffmeans', p: int = 0, seed: int = 42, alpha: float = 0.15) -> DataFrame

Data-driven window selection for local randomization RD.

Tests covariate balance at successively larger windows around the cutoff. The recommended window is the largest for which all covariates remain balanced (p > alpha).

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
x str

Running variable name.

required
c float

RD cutoff value.

0
covs list of str

Covariate names to test balance for. If None, uses quantiles of the running variable as pseudo-covariates.

None
wmin float

Minimum half-window width. Defaults to the smallest gap between adjacent observations near the cutoff.

None
wstep float

Window increment. Defaults to (max_range - wmin) / nwindows.

None
nwindows int

Number of windows to evaluate.

10
statistic str

Test statistic for balance testing.

'diffmeans'
p int

Polynomial order for adjustment.

0
seed int

Random seed.

42
alpha float

Significance level for balance (lenient by default to be conservative about window selection).

0.15

Returns:

Type Description
DataFrame

Columns: window_left, window_right, n_left, n_right, p_value, balanced. Rows sorted by window width.

rdsensitivity

rdsensitivity(data: DataFrame, y: str, x: str, c: float = 0, wlist: Optional[List[float]] = None, nwindows: int = 20, statistic: str = 'diffmeans', p: int = 0, n_perms: int = 500, seed: int = 42, alpha: float = 0.05) -> DataFrame

Sensitivity of RD estimates across different window widths.

For each window, runs rdrandinf and records the estimate, standard error, and p-value. Optionally produces a plot if matplotlib is available.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
c float

RD cutoff value.

0
wlist list of float

Symmetric half-window widths to evaluate. If None, an evenly-spaced grid is generated automatically.

None
nwindows int

Number of windows when wlist is None.

20
statistic str

Test statistic for inference.

'diffmeans'
p int

Polynomial order for adjustment.

0
n_perms int

Number of permutations per window.

500
seed int

Random seed.

42
alpha float

Significance level.

0.05

Returns:

Type Description
DataFrame

Columns: window, estimate, se, pvalue, ci_lower, ci_upper, significant.

rdrbounds

rdrbounds(data: DataFrame, y: str, x: str, c: float = 0, wl: Optional[float] = None, wr: Optional[float] = None, gamma_list: Optional[List[float]] = None, statistic: str = 'ranksum', n_perms: int = 1000, seed: int = 42) -> DataFrame

Rosenbaum sensitivity bounds for RD under local randomization.

Assesses how much hidden bias (departure from random assignment) would be needed to explain away the estimated treatment effect. For each gamma (odds ratio), computes upper and lower p-value bounds under worst-case confounding.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
c float

RD cutoff value.

0
wl float

Window left bound offset (typically negative).

None
wr float

Window right bound offset (typically positive).

None
gamma_list list of float

Odds ratios to evaluate. Defaults to [1, 1.5, 2, 2.5, 3, 4, 5]. gamma=1 is pure randomization; gamma>1 allows confounding.

None
statistic str

Test statistic (ranksum is standard for Rosenbaum bounds).

'ranksum'
n_perms int

Number of permutations for p-value computation.

1000
seed int

Random seed.

42

Returns:

Type Description
DataFrame

Columns: gamma, pvalue_upper, pvalue_lower.

Notes

Under Rosenbaum's model, if gamma=Gamma, the probability that unit i is treated satisfies:

1/(1+Gamma) <= P(D_i=1) <= Gamma/(1+Gamma)

instead of the uniform 1/2 under pure randomization. The bounds on the p-value are obtained by computing the worst-case assignment probabilities at each gamma level.

rdhte

rdhte(data: DataFrame, y: str, x: str, z: Union[str, List[str]], c: float = 0, p: int = 1, h: Optional[float] = None, b: Optional[float] = None, kernel: str = 'triangular', bwselect: str = 'mserd', cluster: Optional[str] = None, alpha: float = 0.05, eval_points: Optional[ndarray] = None, n_eval: int = 20) -> CausalResult

Estimate conditional average treatment effects (CATE) in RD designs.

Fits a fully interacted local polynomial model on each side of the cutoff and computes CATE(z) = (alpha_R - alpha_L) + z'(gamma_R - gamma_L).

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
z str or list of str

Covariate(s) for treatment effect heterogeneity.

required
c float

RD cutoff value.

0
p int

Polynomial order for the running variable (1 = local linear).

1
h float

Bandwidth for estimation. If None, MSE-optimal bandwidth is selected.

None
b float

Bandwidth for bias correction. Defaults to h.

None
kernel str

Kernel function: 'triangular', 'uniform', or 'epanechnikov'.

'triangular'
bwselect str

Bandwidth selection method: 'mserd' or 'msetwo'.

'mserd'
cluster str

Cluster variable name for cluster-robust standard errors.

None
alpha float

Significance level for confidence intervals.

0.05
eval_points ndarray

Z values at which to evaluate CATE. Each row is a point in Z-space. If None, n_eval equally spaced quantiles (10th to 90th pctile) are used.

None
n_eval int

Number of evaluation points when eval_points is not provided.

20

Returns:

Type Description
CausalResult
  • estimate: average CATE across evaluation points (the ATE)
  • detail: DataFrame with [z_value, cate, se, ci_lower, ci_upper, pvalue]
  • model_info: coefficients, heterogeneity test, bandwidth info

Examples:

>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> X = rng.uniform(-1, 1, n)
>>> Z = rng.normal(0, 1, n)
>>> tau_z = 2.0 + 1.5 * Z  # CATE varies with Z
>>> Y = 0.5 * X + tau_z * (X >= 0) + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({'y': Y, 'x': X, 'z': Z})
>>> result = rdhte(df, y='y', x='x', z='z', c=0)
>>> abs(result.estimate - 2.0) < 1.0  # average CATE near 2
True

rdbwhte

rdbwhte(data: DataFrame, y: str, x: str, z: Union[str, List[str]], c: float = 0, p: int = 1, kernel: str = 'triangular') -> float

MSE-optimal bandwidth selection for the fully interacted RD model.

Accounts for the additional variance introduced by the covariate interaction terms compared to the standard local polynomial RD bandwidth.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
z str or list of str

Covariate(s) for treatment effect heterogeneity.

required
c float

RD cutoff value.

0
p int

Polynomial order.

1
kernel str

Kernel function.

'triangular'

Returns:

Type Description
float

MSE-optimal bandwidth.

rdhte_lincom

rdhte_lincom(result: CausalResult, weights: ndarray, alpha: float = 0.05) -> Dict[str, Any]

Compute a weighted linear combination of CATE estimates.

Useful for computing group-specific average treatment effects, e.g., the average CATE for males vs females.

Parameters:

Name Type Description Default
result CausalResult

Result from rdhte().

required
weights ndarray

Linear combination weights. Must have length equal to the number of evaluation points in result.detail.

required
alpha float

Significance level.

0.05

Returns:

Type Description
dict

Keys: 'estimate', 'se', 'ci', 'pvalue'.

Examples:

Compute the average CATE for the first and second halves of Z:

>>> result = rdhte(df, y='y', x='x', z='z')
>>> n_pts = len(result.detail)
>>> w1 = np.zeros(n_pts)
>>> w1[:n_pts//2] = 1.0 / (n_pts//2)  # first half
>>> lincom1 = rdhte_lincom(result, w1)

rd2d_bw

rd2d_bw(data: DataFrame, y: str, x1: str, x2: str, treatment: str, boundary: Optional[Callable] = None, approach: str = 'distance', p: int = 1, kernel: str = 'triangular') -> float

Bandwidth selection for 2D boundary RD.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x1 str

Running variable names.

required
x2 str

Running variable names.

required
treatment str

Binary treatment indicator.

required
boundary callable

Boundary function f(x1) -> x2. None implies x1 = 0.

None
approach str

'distance' or 'location'.

'distance'
p int

Polynomial order.

1
kernel str

Kernel function.

'triangular'

Returns:

Type Description
float

MSE-optimal bandwidth.

rd2d_plot

rd2d_plot(data: DataFrame, y: str, x1: str, x2: str, treatment: str, boundary: Optional[Callable] = None, result: Optional[CausalResult] = None, plot_type: str = 'scatter', ax=None, figsize: tuple = (10, 8)) -> tuple

2D boundary RD visualization.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x1 str

Running variable names.

required
x2 str

Running variable names.

required
treatment str

Binary treatment indicator.

required
boundary callable

Boundary function f(x1) -> x2. None implies x1 = 0.

None
result CausalResult

Result from rd2d(), used for bandwidth and effect info.

None
plot_type str

'scatter': 2D scatter of (x1, x2) colored by treatment status, with boundary curve and optional bandwidth region. 'heatmap': outcome values displayed as a heatmap with boundary overlay. 'boundary_effects': treatment effect estimates along the boundary (requires result with multiple eval points).

'scatter'
ax matplotlib Axes

Pre-existing axes to draw on.

None
figsize tuple

Figure size.

(10, 8)

Returns:

Type Description
(fig, ax)

rd_forest

rd_forest(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, h: Optional[float] = None, n_trees: int = 500, min_leaf: int = 20, honesty: bool = True, alpha: float = 0.05, seed: int = 42) -> CausalResult

Causal Forest for RD — heterogeneous treatment effect estimation.

Adapts the Athey-Wager (2019) generalized random forests framework to an RD context. Within bandwidth h, treatment is D = 1(X >= c). Two separate random forests estimate E[Y|Z, D=0] and E[Y|Z, D=1], and the CATE is their difference.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable.

required
x str

Running variable.

required
c float

RD cutoff.

0
covs list of str

Covariate names used as features for heterogeneity detection. Must not include the running variable x.

None
h float

Bandwidth (uses IK-style automatic selection if None).

None
n_trees int

Number of trees in each forest.

500
min_leaf int

Minimum leaf size (larger → more regularisation).

20
honesty bool

Split-sample (honest) estimation: half the data for tree construction, the other half for leaf predictions.

True
alpha float

Significance level for confidence intervals.

0.05
seed int

Random seed.

42

Returns:

Type Description
CausalResult

estimate = average CATE; detail = DataFrame with per-obs CATE and SE; model_info includes variable importance.

rd_boost

rd_boost(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, h: Optional[float] = None, n_estimators: int = 200, max_depth: int = 3, learning_rate: float = 0.1, alpha: float = 0.05, seed: int = 42) -> CausalResult

Gradient Boosting for RD — flexible CATE estimation.

Fits separate GBM models on each side of the cutoff and estimates individual-level CATEs as mu_1(z) - mu_0(z). Standard errors are obtained via bootstrap.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable.

required
x str

Running variable.

required
c float

RD cutoff.

0
covs list of str

Covariate names for heterogeneity.

None
h float

Bandwidth (auto-selected if None).

None
n_estimators int

Number of boosting rounds.

200
max_depth int

Maximum tree depth per round.

3
learning_rate float

Shrinkage factor.

0.1
alpha float

Significance level.

0.05
seed int

Random seed.

42

Returns:

Type Description
CausalResult

estimate = average CATE; detail = per-obs CATE and bootstrap SE.

rd_lasso

rd_lasso(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, h: Optional[float] = None, kernel: str = 'triangular', cv_folds: int = 5, alpha: float = 0.05) -> CausalResult

LASSO-assisted RD via post-double-selection.

Uses LASSO to select relevant covariates from a potentially large set, then runs a local linear RD with the selected covariates. Follows Belloni, Chernozhukov, and Hansen (2014).

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable.

required
x str

Running variable.

required
c float

RD cutoff.

0
covs list of str

Candidate covariates (can be large set).

None
h float

Bandwidth (auto-selected if None).

None
kernel str

Kernel for local linear regression ('triangular', 'uniform', 'epanechnikov').

'triangular'
cv_folds int

Cross-validation folds for LASSO penalty selection.

5
alpha float

Significance level.

0.05

Returns:

Type Description
CausalResult

estimate = RD treatment effect with LASSO-selected covariates; model_info includes selected_covariates and LASSO paths.

rd_cate_summary

rd_cate_summary(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, h: Optional[float] = None, methods: Optional[List[str]] = None, alpha: float = 0.05, seed: int = 42) -> dict

Run multiple ML-RD methods and compare CATE estimates.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable.

required
x str

Running variable.

required
c float

RD cutoff.

0
covs list of str

Covariates for heterogeneity / selection.

None
h float

Bandwidth (shared across methods).

None
methods list of str

Subset of ['forest', 'boost', 'lasso']. Default: all three.

None
alpha float

Significance level.

0.05
seed int

Random seed.

42

Returns:

Type Description
dict

Keys = method names, values = CausalResult. Additional key 'comparison' = pd.DataFrame summarising ATEs. Additional key 'heterogeneity_drivers' = top variable importances from the forest (if run).

rd_extrapolate

rd_extrapolate(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, treatment: Optional[str] = None, eval_points: Optional[ndarray] = None, n_eval: int = 20, method: str = 'ols', h_local: Optional[float] = None, alpha: float = 0.05) -> CausalResult

Angrist-Rokkanen (2015) extrapolation of RD effects away from the cutoff.

If Y(0), Y(1) are independent of X conditional on covariates Z, then treatment effects can be estimated at any value of the running variable, not just at the cutoff.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
c float

RD cutoff value.

0
covs list of str

Covariate names for conditional independence. Required.

None
treatment str

Treatment variable for fuzzy RD. If None, sharp design assumed (D = 1{X >= c}).

None
eval_points ndarray

Running variable values at which to extrapolate the CATE. If None, n_eval equally spaced points spanning the data range.

None
n_eval int

Number of evaluation points when eval_points is None.

20
method str

Estimation method: 'ols', 'ipw', or 'doubly_robust'.

'ols'
h_local float

Bandwidth for local RD estimate at the cutoff (for comparison). If None, the MSE-optimal bandwidth from rdrobust is used.

None
alpha float

Significance level.

0.05

Returns:

Type Description
CausalResult

estimate is the average treatment effect across evaluation points. detail is a DataFrame with columns [x_value, cate, se, ci_lower, ci_upper]. model_info contains conditional independence test results and the local RD estimate for comparison.

Notes

The key identifying assumption is conditional independence: Y(0), Y(1) ⊥ X | Z. A partial F-test is run on each side of the cutoff to check whether X retains predictive power for Y after conditioning on Z. A rejection suggests the assumption may fail.

References

Angrist, J.D. and Rokkanen, M. (2015). "Wanna Get Away? Regression Discontinuity Estimation of Exam School Effects Away from the Cutoff." JASA, 110(512), 1331-1344. [@angrist2015wanna]

Examples:

>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> Z = rng.normal(0, 1, n)
>>> X = Z + rng.normal(0, 0.5, n)
>>> D = (X >= 0).astype(int)
>>> Y = 1.0 + 2.0 * Z + 3.0 * D + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({'y': Y, 'x': X, 'z': Z})
>>> result = rd_extrapolate(df, y='y', x='x', c=0, covs=['z'])
>>> abs(result.estimate - 3.0) < 1.0
True

rd_multi_extrapolate

rd_multi_extrapolate(data: DataFrame, y: str, x: str, cutoffs: List[float], eval_points: Optional[ndarray] = None, method: str = 'linear', alpha: float = 0.05) -> CausalResult

Multi-cutoff RD extrapolation (Cattaneo, Keele, Titiunik, Vazquez-Bare 2021).

Estimates local RD effects at each cutoff, then interpolates/extrapolates a treatment effect function tau(x) through those point estimates.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
cutoffs list of float

Cutoff values. Must contain at least 2 cutoffs.

required
eval_points ndarray

Running variable values at which to predict tau(x). Defaults to 30 equally spaced points spanning the data range.

None
method str

Interpolation method:

  • 'linear': tau(x) = a + b*x
  • 'polynomial': polynomial of degree min(len(cutoffs)-1, 3)
  • 'weighted': inverse-variance weighted local linear
'linear'
alpha float

Significance level.

0.05

Returns:

Type Description
CausalResult

estimate is the average extrapolated effect. detail is a DataFrame with columns [x_value, cate_extrapolated, se, ci_lower, ci_upper]. model_info contains cutoff-specific estimates and heterogeneity test results.

References

Cattaneo, M.D., Keele, L., Titiunik, R. and Vazquez-Bare, G. (2021). "Extrapolating Treatment Effects in Multi-Cutoff Regression Discontinuity Designs." Journal of the American Statistical Association, 116(536), 1941-1952. doi:10.1080/01621459.2020.1751646 [@cattaneo2021extrapolating]

Examples:

>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 3000
>>> X = rng.uniform(-2, 4, n)
>>> tau_true = 2.0 + 0.5 * X
>>> D = ((X >= 1) | (X >= 3)).astype(int)
>>> Y = 0.5 * X + tau_true * D + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({'y': Y, 'x': X})
>>> result = rd_multi_extrapolate(df, y='y', x='x', cutoffs=[1.0, 3.0])
>>> result.estimate > 0
True

rd_external_validity

rd_external_validity(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, target_x_range: Optional[tuple] = None, alpha: float = 0.05) -> dict

Diagnostic assessment of RD external validity.

Compares covariate distributions between the cutoff neighborhood and a target population, runs the conditional independence test, and provides a recommendation on whether extrapolation is credible.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
x str

Running variable name.

required
c float

RD cutoff value.

0
covs list of str

Covariate names for overlap and CI testing.

None
target_x_range tuple of (float, float)

Running variable range (x_low, x_high) defining the target population. Defaults to the full data range.

None
alpha float

Significance level.

0.05

Returns:

Type Description
dict

Keys:

  • 'local_estimate': local RD effect at the cutoff (CausalResult).
  • 'ci_test': conditional independence test results (dict or None).
  • 'overlap': covariate overlap statistics (dict or None).
  • 'extrapolated_estimate': extrapolated ATE for the target population if CI test passes (CausalResult or None).
  • 'recommendation': human-readable guidance string.

Examples:

>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> Z = rng.normal(0, 1, n)
>>> X = Z + rng.normal(0, 0.5, n)
>>> D = (X >= 0).astype(int)
>>> Y = 1.0 + 2.0 * Z + 3.0 * D + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({'y': Y, 'x': X, 'z': Z})
>>> diag = rd_external_validity(df, y='y', x='x', c=0, covs=['z'])
>>> 'recommendation' in diag
True

rd_interference

rd_interference(data: DataFrame, y: str, running: str, neighbour_running: str, cutoff: float = 0.0, bandwidth: Optional[float] = None, kernel: str = 'triangular', alpha: float = 0.05) -> RDInterferenceResult

Sharp RDD with network interference (Cabrelli-Marconi 2024).

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome.

required
running str

Own running variable.

required
neighbour_running str

Average running variable across neighbours (precomputed).

required
cutoff float
0.0
bandwidth float

Defaults to IQR of own running variable.

None
kernel str
'triangular'
alpha float
0.05

Returns:

Type Description
RDInterferenceResult

rd_multi_score

rd_multi_score(data: DataFrame, y: str, running_vars: List[str], cutoffs: List[float], bandwidth: float = None, kernel: str = 'triangular', alpha: float = 0.05) -> MultiScoreRDResult

Multi-score RDD: treatment if all running variables exceed cutoffs.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome.

required
running_vars list of str

Multiple running variables.

required
cutoffs list of float

One cutoff per running variable (same length).

required
bandwidth float

Defaults to median IQR across running vars.

None
kernel str
'triangular'
alpha float
0.05

Returns:

Type Description
MultiScoreRDResult

rd_distribution

rd_distribution(data: DataFrame, y: str, running: str, cutoff: float = 0.0, quantiles: Optional[ndarray] = None, bandwidth: Optional[float] = None, kernel: str = 'triangular', alpha: float = 0.05) -> DistRDResult

Distribution-valued sharp RDD.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
running str
required
cutoff float
0.0
quantiles array - like

Defaults to (0.1, 0.25, 0.5, 0.75, 0.9).

None
bandwidth float
None
kernel str
'triangular'
alpha float
0.05

Returns:

Type Description
DistRDResult

rd_bayes_hte

rd_bayes_hte(data: DataFrame, y: str, running: str, covariates: List[str], cutoff: float = 0.0, bandwidth: Optional[float] = None, kernel: str = 'triangular', alpha: float = 0.05, n_draws: int = 2000, seed: int = 0) -> BayesRDHTEResult

Bayesian RDD allowing CATE to depend on covariates.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
running str
required
covariates list of str
required
cutoff float
0.0
bandwidth float
None
kernel str
'triangular'
alpha float
0.05
n_draws int
2000
seed int
0

Returns:

Type Description
BayesRDHTEResult

rd_distributional_design

rd_distributional_design(data: DataFrame, y: str, running: str, cutoff: float = 0.0, quantiles: Optional[ndarray] = None, bandwidth: Optional[float] = None, kernel: str = 'triangular') -> DDDResult

Joint RDD + RKD on the conditional distribution of Y.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
running str
required
cutoff float
0.0
quantiles array - like
None
bandwidth float
None
kernel str
'triangular'

rd_bias_aware_fuzzy

rd_bias_aware_fuzzy(data: DataFrame, y: str, x: str, fuzzy: str, c: float = 0.0, M_y: Optional[float] = None, M_d: Optional[float] = None, h: Optional[float] = None, kernel: str = 'triangular', alpha: float = 0.05, cluster: Optional[str] = None, n_grid: int = 401) -> CausalResult

Bias-aware confidence interval for fuzzy RD (Noack & Rothe 2024).

Constructs an Anderson--Rubin-style CI for the Wald-ratio fuzzy RD parameter that takes worst-case smoothing bias of the numerator and denominator local-linear estimates into account. The CI is robust to weak first stages and avoids the power asymmetry documented in Kaliski-Keane-Neal (2025).

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome column.

required
x str

Running variable column.

required
fuzzy str

Treatment indicator column for the fuzzy first stage.

required
c float

RD cutoff.

0.0
M_y float

Smoothness bounds on |g_Y''| and |g_D''|. If None, both are estimated from a local-quadratic fit on each side of the cutoff (the same default as :func:rd_honest).

None
M_d float

Smoothness bounds on |g_Y''| and |g_D''|. If None, both are estimated from a local-quadratic fit on each side of the cutoff (the same default as :func:rd_honest).

None
h float

Bandwidth. If None, defaults to a Silverman pilot.

None
kernel str

Kernel.

``'triangular'``
alpha float

Significance level.

0.05
cluster str

Cluster variable for variance estimation.

None
n_grid int

Grid resolution for the AR-style test inversion.

401

Returns:

Type Description
CausalResult

Result with model_info['bias_aware'] containing M_y, M_d, naive_ci (standard CCT-style fuzzy CI), bias_aware_ci, rejection_grid and first_stage_F.

Notes

The bias-aware CI is the set of τ_0 for which the test statistic

T(τ_0) = (Δ̂_Y − τ_0 · Δ̂_D) / σ̂(τ_0)

has |T(τ_0)| ≤ cv(b(τ_0)) where b(τ_0) is the worst-case bias- to-noise ratio of the numerator–denominator combination. The grid inversion accommodates non-convex CIs that arise under weak first stages.

Examples:

>>> import statspai as sp
>>> r = sp.rd_bias_aware_fuzzy(df, y='earnings', x='age', fuzzy='retired',
...                            c=65.0)
>>> print(r.model_info['bias_aware']['bias_aware_ci'])

rd_dashboard

rd_dashboard(data: DataFrame, y: str, x: str, c: float = 0.0, covs: Optional[List[str]] = None, fuzzy: Optional[str] = None, bw_grid: Optional[Sequence[float]] = None, h: Optional[float] = None, figsize: Tuple[float, float] = (12, 9), title: Optional[str] = None, save: Optional[str] = None)

Four-panel RD diagnostic dashboard.

Combines the four checks every RD analysis should report:

  1. RD plot (top-left): IMSE-binned scatter with polynomial fit.
  2. Density discontinuity (top-right): :func:rdplotdensity output for a McCrary-style manipulation test.
  3. Covariate balance (bottom-left): mean of each cov on each side of the cutoff with point ranges.
  4. Bandwidth sensitivity (bottom-right): point estimate and CI across a grid of bandwidths.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
x str
required
c float

Cutoff.

0.0
covs list of str

Covariates for the balance panel. If None, the panel shows the density's binomial near-cutoff test instead.

None
fuzzy str

Fuzzy treatment column (passed through to RD plot/sensitivity).

None
bw_grid sequence of float

Bandwidths to evaluate in the sensitivity panel. If None, uses [0.5, 0.75, 1.0, 1.25, 1.5, 2.0] × h_mse.

None
h float

Reference bandwidth used for plotting and as the basis for bw_grid. If None, MSE-optimal bandwidth from rdrobust.

None
figsize tuple
(12, 9)
title str

Suptitle.

None
save str

If a path is given, also save the figure (extension determines format).

None

Returns:

Type Description
(fig, axes) where axes is a 2x2 numpy array.

rd_compare

rd_compare(data: DataFrame, y: str, x: str, c: float = 0.0, methods: Sequence[str] = _DEFAULT_COMPARE_METHODS, fuzzy: Optional[str] = None, method_kwargs: Optional[Dict[str, Dict[str, Any]]] = None, alpha: float = 0.05) -> DataFrame

Compare multiple RD estimators on the same data.

Returns a tidy DataFrame: one row per method with point estimate, SE, p-value and CI. Useful for robustness tables across estimation families (local-polynomial vs honest vs local-randomisation vs flexible-covariate-adjusted).

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
x str
required
c float
0.0
methods sequence of str

Method aliases recognised by the :data:sp.rd._RD_METHOD_ALIASES dispatcher. Default: ('rdrobust', 'honest', 'randinf').

_DEFAULT_COMPARE_METHODS
fuzzy str

Fuzzy treatment column passed through to all methods that accept it.

None
method_kwargs dict of dict

Per-method extra kwargs, e.g. {'rdrobust': {'kernel': 'uniform'}}.

None
alpha float

Confidence level forwarded to each estimator that accepts alpha.

0.05

Returns:

Type Description
DataFrame

Columns: method, estimate, se, pvalue, ci_lower, ci_upper, n_obs, status.

rd_robustness_table

rd_robustness_table(data: DataFrame, y: str, x: str, c: float = 0.0, fuzzy: Optional[str] = None, kernels: Sequence[str] = ('triangular', 'epanechnikov', 'uniform'), bwselects: Sequence[str] = ('mserd', 'cerrd', 'msetwo'), polynomials: Sequence[int] = (1, 2), donuts: Sequence[float] = (0.0,), covs: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05) -> DataFrame

Sweep over (kernel, bwselect, polynomial, donut) and return a robustness table for reporting.

Each row is one specification. The DataFrame contains both "Conventional" and "Robust" point estimates and CIs from :func:rdrobust, plus the bandwidth used. Suitable for direct .to_latex(...) / .to_excel(...) export, or feed into :func:statspai.outreg2 for native multi-column tables.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
x str
required
c float
0.0
fuzzy optional, forwarded to rdrobust
None
covs optional, forwarded to rdrobust
None
cluster optional, forwarded to rdrobust
None
kernels sequences

Specification grid.

('triangular', 'epanechnikov', 'uniform')
bwselects sequences

Specification grid.

('triangular', 'epanechnikov', 'uniform')
polynomials sequences

Specification grid.

('triangular', 'epanechnikov', 'uniform')
donuts sequences

Specification grid.

('triangular', 'epanechnikov', 'uniform')
alpha float
0.05

Returns:

Type Description
pd.DataFrame with one row per specification and columns:

kernel, bwselect, p, donut, h, b, estimate_conv, se_conv, ci_conv_lo, ci_conv_hi, pvalue_conv, estimate_rbc, se_rbc, ci_rbc_lo, ci_rbc_hi, pvalue_rbc, n_left, n_right, status

multi_cutoff_rd

multi_cutoff_rd(*args, **kwargs)

User-friendly alias for :func:sp.rdmc (multi-cutoff RD).

See :func:statspai.rd.rdmulti.rdmc for full documentation.

geographic_rd

geographic_rd(*args, **kwargs)

User-friendly alias for :func:sp.rdms (multi-score RD).

Geographic RD is the most common multi-score special case: the running score is a signed distance to a political boundary. Cattaneo et al. (2024) dispatch it via rdms; this alias makes the intent explicit.

boundary_rd

boundary_rd(*args, **kwargs)

User-friendly alias for :func:sp.rd2d (boundary discontinuity design).

Cattaneo, Titiunik & Yu (2025) boundary discontinuity design for 2D running variables (lat/long, for example).

multi_score_rd

multi_score_rd(*args, **kwargs)

User-friendly alias for :func:sp.rd_multi_score.

Multi-score RD when eligibility depends on more than one discontinuous rule (e.g. income AND age thresholds). Separate from boundary RDD in that the rules are axis-aligned.

fit

fit(data: Any = None, y: Any = None, x: Any = None, c: Any = 0, *, method: str = 'rdrobust', **kwargs: Any)

Alias for :func:_rd_dispatch. See sp.rd.__doc__ for usage.