statspai.rd¶
rd ¶
Regression Discontinuity (RD) module for StatsPAI.
Regression-discontinuity tools exposed through the StatsPAI API:
Core estimation: - Sharp, Fuzzy, and Kink RD estimation with robust bias-corrected inference (CCT 2014) - Covariate-adjusted local polynomial estimation (Calonico et al. 2019) - Donut-hole RD for manipulation near cutoff - Regression Discontinuity in Time (Hausman-Rapson 2018) - Multi-cutoff and multi-score RD (Cattaneo et al. 2024) - Boundary discontinuity / 2D RD designs (Cattaneo-Titiunik-Yu 2025)
Bandwidth selection: - MSE-optimal: mserd, msetwo, msecomb1, msecomb2 - CER-optimal: cerrd, certwo, cercomb1, cercomb2 (Calonico-Cattaneo-Farrell 2020) - Fuzzy-specific and covariate-adjusted bandwidth selection
Inference: - Honest confidence intervals (Armstrong-Kolesar 2018, 2020) - Local randomization inference with Fisher exact tests (Cattaneo-Titiunik-VB 2016) - Window selection and sensitivity analysis - Rosenbaum sensitivity bounds
Treatment effect heterogeneity: - CATE estimation via fully interacted local linear (Calonico et al. 2025) - ML + RD: Causal Forest, Gradient Boosting, LASSO-assisted RD
External validity: - Extrapolation away from cutoff (Angrist-Rokkanen 2015) - Multi-cutoff extrapolation (Cattaneo et al. 2024) - External validity diagnostics
Diagnostics & visualization: - Diagnostic dashboard (rdsummary) - IMSE-optimal binned scatter with pointwise CI bands - Density manipulation testing (CJM 2020) - Bandwidth sensitivity, covariate balance, placebo cutoff tests - Power analysis and sample size calculations
RDMultiResult ¶
Results from multi-cutoff/multi-score RD.
RDInterferenceResult
dataclass
¶
Direct + spillover RDD effects under network interference.
MultiScoreRDResult
dataclass
¶
Multi-score RDD effect on the boundary.
DistRDResult
dataclass
¶
RDD effect on each quantile.
BayesRDHTEResult
dataclass
¶
Bayesian RDD with HTE posterior summaries.
DDDResult
dataclass
¶
Distributional discontinuity design output.
rdplot ¶
rdplot(data: DataFrame, y: str, x: str, c: float = 0, nbins: Optional[int] = None, binselect: str = 'esmv', p: int = 4, kernel: str = 'triangular', ci_level: float = 0.95, shade_ci: bool = True, donut: float = 0, show_bw: bool = False, h: Optional[float] = None, covs: Optional[List[str]] = None, weights: Optional[str] = None, hide_ci: bool = False, scatter: bool = True, ax=None, figsize: tuple = (10, 7), title: Optional[str] = None, x_label: Optional[str] = None, y_label: Optional[str] = None)
RD plot: binned scatter with polynomial fit on each side of the cutoff.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome and running variable names. |
required |
x
|
str
|
Outcome and running variable names. |
required |
c
|
float
|
Cutoff. |
0
|
nbins
|
int
|
Bins per side. If None, uses data-driven selection via binselect. |
None
|
binselect
|
str
|
Bin selection method (when nbins=None): - 'es' : IMSE-optimal evenly spaced - 'espr' : IMSE-optimal evenly spaced (mimicking variance) - 'qs' : IMSE-optimal quantile-spaced - 'qspr' : IMSE-optimal quantile-spaced (mimicking variance) - 'esmv' : IMSE-optimal evenly spaced with variance mimicking (default) - 'qsmv' : IMSE-optimal quantile-spaced with variance mimicking |
'esmv'
|
p
|
int
|
Polynomial order for the fitted curve. |
4
|
kernel
|
str
|
Kernel for the fitted curve. |
'triangular'
|
ci_level
|
float
|
Confidence level for pointwise CI bands. |
0.95
|
shade_ci
|
bool
|
Show confidence interval bands around the polynomial fit. |
True
|
donut
|
float
|
If > 0, shades the donut region |x - c| <= donut. |
0
|
show_bw
|
bool
|
If True, shades the bandwidth window. |
False
|
h
|
float
|
Bandwidth to display. |
None
|
covs
|
list of str
|
Covariates to partial out before binning and plotting. |
None
|
weights
|
str
|
Column name for observation weights in polynomial fitting. |
None
|
hide_ci
|
bool
|
If True, suppress CI bands entirely. |
False
|
scatter
|
bool
|
Show binned scatter points. |
True
|
ax
|
matplotlib Axes
|
|
None
|
figsize
|
tuple
|
|
(10, 7)
|
title
|
str
|
|
None
|
x_label
|
str
|
|
None
|
y_label
|
str
|
|
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
rdplotdensity ¶
rdplotdensity(data: DataFrame, x: str, c: float = 0, p: int = 2, n_grid: int = 50, h: Optional[float] = None, ci_level: float = 0.95, hist: bool = True, nbins: int = 30, ax=None, figsize: tuple = (10, 7), title: Optional[str] = None)
Boundary-adaptive density discontinuity plot at the RD cutoff.
Implements the local-polynomial density estimator of Cattaneo,
Jansson & Ma (2020, Journal of the American Statistical
Association 115(531), 1449-1455): for each side of the cutoff
the empirical CDF F̂ is constructed and a local polynomial of
order p is fit to F̂. The slope at the cutoff equals the
density f̂(c±), and the asymptotic variance comes from the same
local-polynomial sandwich. Boundary-adaptive — does not require
a separate boundary kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
x
|
str
|
Running variable. |
required |
c
|
float
|
Cutoff. |
0
|
p
|
int
|
Polynomial order for the CDF regression (p=2 recommended; p=1 is faster but with worse boundary behavior). |
2
|
n_grid
|
int
|
Grid points per side for the density curve. |
50
|
h
|
float
|
Bandwidth. If None, side-specific Silverman pilot. |
None
|
ci_level
|
float
|
Confidence level for CI bands. |
0.95
|
hist
|
bool
|
Overlay histogram. |
True
|
nbins
|
int
|
Number of histogram bins per side. |
30
|
ax
|
matplotlib Axes
|
|
None
|
figsize
|
tuple
|
|
(10, 7)
|
title
|
str
|
|
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
References
Cattaneo, M.D., Jansson, M. and Ma, X. (2020). "Simple Local Polynomial Density Estimators." JASA 115(531), 1449-1455. [@cattaneo2020simple]
rdbwselect ¶
rdbwselect(data: DataFrame, y: str, x: str, c: float = 0, fuzzy: Optional[str] = None, deriv: int = 0, p: int = 1, q: Optional[int] = None, covs: Optional[List[str]] = None, kernel: str = 'triangular', bwselect: str = 'mserd', cluster: Optional[str] = None, all: bool = False) -> DataFrame
Bandwidth selection for local polynomial RD estimation.
Implements all eight MSE-optimal and CER-optimal bandwidth selection procedures from Calonico, Cattaneo, and Farrell (2020). MSE-optimal bandwidths minimize the mean squared error of the RD point estimator, while CER-optimal bandwidths minimize the coverage error rate of robust bias-corrected confidence intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable column name. |
required |
x
|
str
|
Running variable column name. |
required |
c
|
float
|
RD cutoff value. |
0
|
fuzzy
|
str
|
Treatment variable name for fuzzy RD. When provided, bandwidth accounts for first-stage variance in the Wald / IV estimator. |
None
|
deriv
|
int
|
Derivative order. 0 = standard RD (jump in level), 1 = regression kink design (change in slope). |
0
|
p
|
int
|
Polynomial order for point estimation (1 = local linear). |
1
|
q
|
int
|
Polynomial order for bias correction. Default is p + 1. |
None
|
covs
|
list of str
|
Covariate column names. When provided, the variance estimates used in bandwidth selection account for covariate adjustment, typically yielding narrower bandwidths. |
None
|
kernel
|
str
|
Kernel function: 'triangular', 'uniform', or 'epanechnikov'. |
'triangular'
|
bwselect
|
str
|
Bandwidth selection method. One of:
|
'mserd'
|
cluster
|
str
|
Cluster variable name for cluster-robust variance estimation. |
None
|
all
|
bool
|
If True, compute and return all eight bandwidth types. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with columns |
Notes
The CER-optimal bandwidth is related to the MSE-optimal bandwidth by:
h_CER = h_MSE * n^{-1/((2p+3)(2p+5))}
For p=1 (local linear), this gives h_CER ~ h_MSE * n^{-1/35}, which is strictly narrower than h_MSE. The narrower bandwidth yields confidence intervals with better coverage properties at the cost of slightly wider intervals.
The MSE-optimal bandwidth has rate n^{-1/(2p+3)} while the CER rate is n^{-1/(2p+3) - 1/((2p+3)(2p+5))}. For large samples the difference is meaningful: CER bandwidths produce robust CIs that achieve their nominal coverage rate, whereas MSE bandwidths can exhibit substantial coverage distortion.
References
Calonico, S., Cattaneo, M.D. and Farrell, M.H. (2020). "Optimal Bandwidth Choice for Robust Bias-Corrected Inference in Regression Discontinuity Designs." Econometrics Journal, 23(2), 192-210. [@calonico2020optimal]
Calonico, S., Cattaneo, M.D. and Titiunik, R. (2014). "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs." Econometrica, 82(6), 2295-2326. [@calonico2014robust]
Examples:
Basic MSE-optimal bandwidth:
>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> X = rng.uniform(-1, 1, n)
>>> Y = 0.5 * X + 3.0 * (X >= 0) + rng.normal(0, 0.3, n)
>>> df = pd.DataFrame({'y': Y, 'x': X})
>>> bw = sp.rdbwselect(df, y='outcome', x='score', c=0)
>>> print(bw) # DataFrame with bandwidth info
Compare all eight bandwidth methods:
>>> bw_all = sp.rdbwselect(df, y='outcome', x='score', c=0, all=True)
>>> print(bw_all) # All 8 bandwidth methods compared
CER-optimal bandwidth for better coverage:
Fuzzy RD with covariates:
rdbwsensitivity ¶
rdbwsensitivity(data: DataFrame, y: str, x: str, c: float = 0, fuzzy: Optional[str] = None, p: int = 1, kernel: str = 'triangular', bw_grid: Optional[List[float]] = None, n_grid: int = 15, bw_range: tuple = (0.5, 2.0), alpha: float = 0.05, ax=None, figsize: tuple = (10, 6)) -> DataFrame
Bandwidth sensitivity analysis for RD estimates.
Re-estimates the RD effect across a grid of bandwidths to assess robustness. Plots point estimates with confidence intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome and running variable names. |
required |
x
|
str
|
Outcome and running variable names. |
required |
c
|
float
|
Cutoff. |
0
|
fuzzy
|
str
|
Treatment variable for fuzzy RD. |
None
|
p
|
int
|
Polynomial order. |
1
|
kernel
|
str
|
|
'triangular'
|
bw_grid
|
list of float
|
Explicit bandwidth values to evaluate. If None, auto-generates a grid as multiples of the MSE-optimal bandwidth. |
None
|
n_grid
|
int
|
Number of grid points if bw_grid is None. |
15
|
bw_range
|
tuple
|
Range of multipliers for the optimal bandwidth. |
(0.5, 2.0)
|
alpha
|
float
|
|
0.05
|
ax
|
matplotlib Axes
|
|
None
|
figsize
|
tuple
|
|
(10, 6)
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: bandwidth, estimate, se, ci_lower, ci_upper, pvalue. |
rdbalance ¶
rdbalance(data: DataFrame, x: str, c: float = 0, covs: Optional[List[str]] = None, p: int = 1, kernel: str = 'triangular', h: Optional[float] = None, alpha: float = 0.05) -> DataFrame
Covariate balance test at the RD cutoff.
For each covariate, estimates the discontinuity at the cutoff using local polynomial regression. Pre-treatment covariates should show no significant jump at the cutoff if the RD design is valid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
Cutoff. |
0
|
covs
|
list of str
|
Covariate names to test. If None, tests all numeric columns except x. |
None
|
p
|
int
|
|
1
|
kernel
|
str
|
|
'triangular'
|
h
|
float
|
Manual bandwidth. If None, uses MSE-optimal per covariate. |
None
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: covariate, estimate, se, z, pvalue, significant. |
rdplacebo ¶
rdplacebo(data: DataFrame, y: str, x: str, c: float = 0, placebo_cutoffs: Optional[List[float]] = None, n_placebo: int = 10, side: str = 'both', fuzzy: Optional[str] = None, p: int = 1, kernel: str = 'triangular', alpha: float = 0.05, ax=None, figsize: tuple = (10, 6)) -> DataFrame
Placebo cutoff test for RD validity.
Estimates the RD effect at fake cutoff points where no treatment effect should exist. Significant effects at placebo cutoffs suggest the RD design may be invalid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome and running variable names. |
required |
x
|
str
|
Outcome and running variable names. |
required |
c
|
float
|
True cutoff. |
0
|
placebo_cutoffs
|
list of float
|
Explicit placebo cutoff values. If None, auto-generates from the data distribution. |
None
|
n_placebo
|
int
|
Number of placebo cutoffs if auto-generating. |
10
|
side
|
str
|
Which side of the true cutoff to place placebos: 'left', 'right', or 'both'. |
'both'
|
fuzzy
|
str
|
Treatment variable for fuzzy RD. |
None
|
p
|
int
|
|
1
|
kernel
|
str
|
|
'triangular'
|
alpha
|
float
|
|
0.05
|
ax
|
matplotlib Axes
|
|
None
|
figsize
|
tuple
|
|
(10, 6)
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: cutoff, estimate, se, ci_lower, ci_upper, pvalue, is_true_cutoff. |
rdsummary ¶
rdsummary(data: DataFrame, y: str, x: str, c: float = 0, fuzzy: Optional[str] = None, covs: Optional[List[str]] = None, p: int = 1, kernel: str = 'triangular', alpha: float = 0.05, verbose: bool = True, plot: bool = False, full: bool = False) -> Dict[str, Any]
One-stop RD diagnostic battery.
Runs the main RD estimate plus all standard validation checks in a single call, returning a structured summary.
Diagnostics run (standard): 1. Main RD estimate (conventional + robust) 2. Density manipulation test (CJM 2020) 3. Covariate balance at cutoff (if covs provided) 4. Bandwidth sensitivity
Extended diagnostics (full=True): 5. Honest CI (Armstrong-Kolesár 2020) 6. Power analysis (MDE at 80%) 7. Placebo cutoff tests (5 placebos per side) 8. Multiple bandwidth selectors comparison
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome and running variable. |
required |
x
|
str
|
Outcome and running variable. |
required |
c
|
float
|
Cutoff. |
0
|
fuzzy
|
str
|
Treatment variable for fuzzy RD. |
None
|
covs
|
list of str
|
Pre-treatment covariates for balance test. |
None
|
p
|
int
|
|
1
|
kernel
|
str
|
|
'triangular'
|
alpha
|
float
|
|
0.05
|
verbose
|
bool
|
Print formatted summary to console. |
True
|
plot
|
bool
|
Generate a multi-panel diagnostic plot. |
False
|
full
|
bool
|
Run extended diagnostics (honest CI, power, placebos). |
False
|
Returns:
| Type | Description |
|---|---|
dict with keys:
|
'estimate': CausalResult from rdrobust 'density_test': CausalResult from rddensity 'balance': pd.DataFrame (if covs given) 'bw_sensitivity': pd.DataFrame 'honest_ci': CausalResult (if full=True) 'power': RDPowerResult (if full=True) 'placebos': pd.DataFrame (if full=True) 'bandwidth_comparison': pd.DataFrame (if full=True) 'figure': matplotlib Figure (if plot=True) |
rd_honest ¶
rd_honest(data: DataFrame, y: str, x: str, c: float = 0.0, M: Optional[float] = None, kernel: str = 'triangular', h: Optional[float] = None, alpha: float = 0.05, opt_criterion: str = 'mse') -> CausalResult
Honest confidence intervals for regression discontinuity designs.
Implements Armstrong & Kolesár (2018, 2020): CIs that are valid uniformly over the class of regression functions whose second derivative is bounded by M. These "honest" CIs account for the smoothing bias that standard local-polynomial CIs ignore, yielding correct coverage even in finite samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
RD cutoff. |
0
|
M
|
float
|
Upper bound on |f''(c)|. If |
None
|
kernel
|
str
|
Kernel function: |
"triangular"
|
h
|
float
|
Bandwidth. If |
None
|
alpha
|
float
|
Significance level for the confidence interval. |
0.05
|
opt_criterion
|
str
|
Bandwidth selection criterion when h is |
"mse"
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
Result object with |
rdmc ¶
rdmc(data: DataFrame, y: str, x: str, cutoffs: List[float], bandwidth: float = None, kernel: str = 'triangular', pooling: str = 'ivw', alpha: float = 0.05) -> RDMultiResult
Multi-cutoff RD design.
Estimates treatment effects at multiple cutoffs and pools them.
Equivalent to rdmulti::rdmc() in R/Stata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
x
|
str
|
Running variable. |
required |
cutoffs
|
list of float
|
Cutoff values. |
required |
bandwidth
|
float
|
Bandwidth for local polynomial. If None, uses Silverman rule. |
None
|
kernel
|
str
|
|
'triangular'
|
pooling
|
str
|
Pooling method: 'ivw' (inverse-variance weighted) or 'equal'. |
'ivw'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
RDMultiResult
|
|
Examples:
rdms ¶
rdms(data: DataFrame, y: str, x1: str, x2: str, cutoff1: float = 0, cutoff2: float = 0, bandwidth: float = None, kernel: str = 'triangular', alpha: float = 0.05) -> CausalResult
Multi-score / Geographic RD design.
Handles two-dimensional running variables for geographic boundaries.
Equivalent to rdmulti::rdms() and Keele & Titiunik (2015).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
x1
|
str
|
First running variable (e.g., latitude distance to boundary). |
required |
x2
|
str
|
Second running variable (e.g., longitude distance to boundary). |
required |
cutoff1
|
float
|
Cutoff for x1. |
0
|
cutoff2
|
float
|
Cutoff for x2. |
0
|
bandwidth
|
float
|
|
None
|
kernel
|
str
|
|
'triangular'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
rdsampsi ¶
rdsampsi(tau: float, var_left: float = 1.0, var_right: float = 1.0, h_left: float = 0.5, h_right: float = 0.5, alpha: float = 0.05, target_power: float = 0.8, ratio: float = 1.0) -> RDSampSiResult
Minimum sample size for a given power in an RD design.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ratio
|
float
|
|
1.0
|
rdrandinf ¶
rdrandinf(data: DataFrame, y: str, x: str, c: float = 0, wl: Optional[float] = None, wr: Optional[float] = None, statistic: str = 'diffmeans', p: int = 0, covs: Optional[List[str]] = None, kernel: str = 'uniform', n_perms: int = 1000, fuzzy: Optional[str] = None, alpha: float = 0.05, seed: int = 42) -> CausalResult
Randomization inference for regression discontinuity designs.
Under the local randomization assumption, units within a small window around the cutoff are treated as if randomly assigned. Inference is based on Fisher's randomization test.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
RD cutoff value. |
0
|
wl
|
float
|
Window left bound offset from cutoff (typically negative).
The left edge of the window is |
None
|
wr
|
float
|
Window right bound offset from cutoff (typically positive).
The right edge of the window is |
None
|
statistic
|
str
|
Test statistic: 'diffmeans', 'ksmirnov', 'ranksum', or 'all'. |
'diffmeans'
|
p
|
int
|
Polynomial order for adjustment (0 = unadjusted). |
0
|
covs
|
list of str
|
Covariate names to partial out before testing. |
None
|
kernel
|
str
|
Kernel weighting (only 'uniform' currently supported for local randomization). |
'uniform'
|
n_perms
|
int
|
Number of permutations for Fisher randomization test. |
1000
|
fuzzy
|
str
|
Actual treatment variable for fuzzy RD. The Wald (IV) estimator is computed within the window. |
None
|
alpha
|
float
|
Significance level. |
0.05
|
seed
|
int
|
Random seed for reproducibility. |
42
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
Result with treatment effect estimate, permutation and asymptotic p-values, and confidence interval. |
Notes
The confidence interval is obtained by test inversion when
statistic='diffmeans': the set of hypothesised effect values
tau_0 that are not rejected by the permutation test at level alpha.
References
Cattaneo, M.D., Titiunik, R. and Vazquez-Bare, G. (2016). "Inference in Regression Discontinuity Designs under Local Randomization." The Stata Journal, 16(2), 331-367. [@cattaneo2016inference]
rdwinselect ¶
rdwinselect(data: DataFrame, x: str, c: float = 0, covs: Optional[List[str]] = None, wmin: Optional[float] = None, wstep: Optional[float] = None, nwindows: int = 10, statistic: str = 'diffmeans', p: int = 0, seed: int = 42, alpha: float = 0.15) -> DataFrame
Data-driven window selection for local randomization RD.
Tests covariate balance at successively larger windows around the cutoff. The recommended window is the largest for which all covariates remain balanced (p > alpha).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
RD cutoff value. |
0
|
covs
|
list of str
|
Covariate names to test balance for. If None, uses quantiles of the running variable as pseudo-covariates. |
None
|
wmin
|
float
|
Minimum half-window width. Defaults to the smallest gap between adjacent observations near the cutoff. |
None
|
wstep
|
float
|
Window increment. Defaults to |
None
|
nwindows
|
int
|
Number of windows to evaluate. |
10
|
statistic
|
str
|
Test statistic for balance testing. |
'diffmeans'
|
p
|
int
|
Polynomial order for adjustment. |
0
|
seed
|
int
|
Random seed. |
42
|
alpha
|
float
|
Significance level for balance (lenient by default to be conservative about window selection). |
0.15
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: window_left, window_right, n_left, n_right, p_value, balanced. Rows sorted by window width. |
rdsensitivity ¶
rdsensitivity(data: DataFrame, y: str, x: str, c: float = 0, wlist: Optional[List[float]] = None, nwindows: int = 20, statistic: str = 'diffmeans', p: int = 0, n_perms: int = 500, seed: int = 42, alpha: float = 0.05) -> DataFrame
Sensitivity of RD estimates across different window widths.
For each window, runs rdrandinf and records the estimate, standard
error, and p-value. Optionally produces a plot if matplotlib is
available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
RD cutoff value. |
0
|
wlist
|
list of float
|
Symmetric half-window widths to evaluate. If None, an evenly-spaced grid is generated automatically. |
None
|
nwindows
|
int
|
Number of windows when |
20
|
statistic
|
str
|
Test statistic for inference. |
'diffmeans'
|
p
|
int
|
Polynomial order for adjustment. |
0
|
n_perms
|
int
|
Number of permutations per window. |
500
|
seed
|
int
|
Random seed. |
42
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: window, estimate, se, pvalue, ci_lower, ci_upper, significant. |
rdrbounds ¶
rdrbounds(data: DataFrame, y: str, x: str, c: float = 0, wl: Optional[float] = None, wr: Optional[float] = None, gamma_list: Optional[List[float]] = None, statistic: str = 'ranksum', n_perms: int = 1000, seed: int = 42) -> DataFrame
Rosenbaum sensitivity bounds for RD under local randomization.
Assesses how much hidden bias (departure from random assignment) would be needed to explain away the estimated treatment effect. For each gamma (odds ratio), computes upper and lower p-value bounds under worst-case confounding.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
RD cutoff value. |
0
|
wl
|
float
|
Window left bound offset (typically negative). |
None
|
wr
|
float
|
Window right bound offset (typically positive). |
None
|
gamma_list
|
list of float
|
Odds ratios to evaluate. Defaults to [1, 1.5, 2, 2.5, 3, 4, 5]. gamma=1 is pure randomization; gamma>1 allows confounding. |
None
|
statistic
|
str
|
Test statistic (ranksum is standard for Rosenbaum bounds). |
'ranksum'
|
n_perms
|
int
|
Number of permutations for p-value computation. |
1000
|
seed
|
int
|
Random seed. |
42
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: gamma, pvalue_upper, pvalue_lower. |
Notes
Under Rosenbaum's model, if gamma=Gamma, the probability that unit i is treated satisfies:
1/(1+Gamma) <= P(D_i=1) <= Gamma/(1+Gamma)
instead of the uniform 1/2 under pure randomization. The bounds on the p-value are obtained by computing the worst-case assignment probabilities at each gamma level.
rdhte ¶
rdhte(data: DataFrame, y: str, x: str, z: Union[str, List[str]], c: float = 0, p: int = 1, h: Optional[float] = None, b: Optional[float] = None, kernel: str = 'triangular', bwselect: str = 'mserd', cluster: Optional[str] = None, alpha: float = 0.05, eval_points: Optional[ndarray] = None, n_eval: int = 20) -> CausalResult
Estimate conditional average treatment effects (CATE) in RD designs.
Fits a fully interacted local polynomial model on each side of the cutoff and computes CATE(z) = (alpha_R - alpha_L) + z'(gamma_R - gamma_L).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
z
|
str or list of str
|
Covariate(s) for treatment effect heterogeneity. |
required |
c
|
float
|
RD cutoff value. |
0
|
p
|
int
|
Polynomial order for the running variable (1 = local linear). |
1
|
h
|
float
|
Bandwidth for estimation. If None, MSE-optimal bandwidth is selected. |
None
|
b
|
float
|
Bandwidth for bias correction. Defaults to h. |
None
|
kernel
|
str
|
Kernel function: 'triangular', 'uniform', or 'epanechnikov'. |
'triangular'
|
bwselect
|
str
|
Bandwidth selection method: 'mserd' or 'msetwo'. |
'mserd'
|
cluster
|
str
|
Cluster variable name for cluster-robust standard errors. |
None
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
eval_points
|
ndarray
|
Z values at which to evaluate CATE. Each row is a point in Z-space. If None, n_eval equally spaced quantiles (10th to 90th pctile) are used. |
None
|
n_eval
|
int
|
Number of evaluation points when eval_points is not provided. |
20
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> X = rng.uniform(-1, 1, n)
>>> Z = rng.normal(0, 1, n)
>>> tau_z = 2.0 + 1.5 * Z # CATE varies with Z
>>> Y = 0.5 * X + tau_z * (X >= 0) + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({'y': Y, 'x': X, 'z': Z})
>>> result = rdhte(df, y='y', x='x', z='z', c=0)
>>> abs(result.estimate - 2.0) < 1.0 # average CATE near 2
True
rdbwhte ¶
rdbwhte(data: DataFrame, y: str, x: str, z: Union[str, List[str]], c: float = 0, p: int = 1, kernel: str = 'triangular') -> float
MSE-optimal bandwidth selection for the fully interacted RD model.
Accounts for the additional variance introduced by the covariate interaction terms compared to the standard local polynomial RD bandwidth.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
z
|
str or list of str
|
Covariate(s) for treatment effect heterogeneity. |
required |
c
|
float
|
RD cutoff value. |
0
|
p
|
int
|
Polynomial order. |
1
|
kernel
|
str
|
Kernel function. |
'triangular'
|
Returns:
| Type | Description |
|---|---|
float
|
MSE-optimal bandwidth. |
rdhte_lincom ¶
Compute a weighted linear combination of CATE estimates.
Useful for computing group-specific average treatment effects, e.g., the average CATE for males vs females.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
CausalResult
|
Result from rdhte(). |
required |
weights
|
ndarray
|
Linear combination weights. Must have length equal to the number of evaluation points in result.detail. |
required |
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys: 'estimate', 'se', 'ci', 'pvalue'. |
Examples:
Compute the average CATE for the first and second halves of Z:
rd2d_bw ¶
rd2d_bw(data: DataFrame, y: str, x1: str, x2: str, treatment: str, boundary: Optional[Callable] = None, approach: str = 'distance', p: int = 1, kernel: str = 'triangular') -> float
Bandwidth selection for 2D boundary RD.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x1
|
str
|
Running variable names. |
required |
x2
|
str
|
Running variable names. |
required |
treatment
|
str
|
Binary treatment indicator. |
required |
boundary
|
callable
|
Boundary function f(x1) -> x2. None implies x1 = 0. |
None
|
approach
|
str
|
'distance' or 'location'. |
'distance'
|
p
|
int
|
Polynomial order. |
1
|
kernel
|
str
|
Kernel function. |
'triangular'
|
Returns:
| Type | Description |
|---|---|
float
|
MSE-optimal bandwidth. |
rd2d_plot ¶
rd2d_plot(data: DataFrame, y: str, x1: str, x2: str, treatment: str, boundary: Optional[Callable] = None, result: Optional[CausalResult] = None, plot_type: str = 'scatter', ax=None, figsize: tuple = (10, 8)) -> tuple
2D boundary RD visualization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x1
|
str
|
Running variable names. |
required |
x2
|
str
|
Running variable names. |
required |
treatment
|
str
|
Binary treatment indicator. |
required |
boundary
|
callable
|
Boundary function f(x1) -> x2. None implies x1 = 0. |
None
|
result
|
CausalResult
|
Result from |
None
|
plot_type
|
str
|
|
'scatter'
|
ax
|
matplotlib Axes
|
Pre-existing axes to draw on. |
None
|
figsize
|
tuple
|
Figure size. |
(10, 8)
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
rd_forest ¶
rd_forest(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, h: Optional[float] = None, n_trees: int = 500, min_leaf: int = 20, honesty: bool = True, alpha: float = 0.05, seed: int = 42) -> CausalResult
Causal Forest for RD — heterogeneous treatment effect estimation.
Adapts the Athey-Wager (2019) generalized random forests framework to an RD context. Within bandwidth h, treatment is D = 1(X >= c). Two separate random forests estimate E[Y|Z, D=0] and E[Y|Z, D=1], and the CATE is their difference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable. |
required |
x
|
str
|
Running variable. |
required |
c
|
float
|
RD cutoff. |
0
|
covs
|
list of str
|
Covariate names used as features for heterogeneity detection. Must not include the running variable x. |
None
|
h
|
float
|
Bandwidth (uses IK-style automatic selection if None). |
None
|
n_trees
|
int
|
Number of trees in each forest. |
500
|
min_leaf
|
int
|
Minimum leaf size (larger → more regularisation). |
20
|
honesty
|
bool
|
Split-sample (honest) estimation: half the data for tree construction, the other half for leaf predictions. |
True
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
seed
|
int
|
Random seed. |
42
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
estimate = average CATE; detail = DataFrame with per-obs CATE and SE; model_info includes variable importance. |
rd_boost ¶
rd_boost(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, h: Optional[float] = None, n_estimators: int = 200, max_depth: int = 3, learning_rate: float = 0.1, alpha: float = 0.05, seed: int = 42) -> CausalResult
Gradient Boosting for RD — flexible CATE estimation.
Fits separate GBM models on each side of the cutoff and estimates individual-level CATEs as mu_1(z) - mu_0(z). Standard errors are obtained via bootstrap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable. |
required |
x
|
str
|
Running variable. |
required |
c
|
float
|
RD cutoff. |
0
|
covs
|
list of str
|
Covariate names for heterogeneity. |
None
|
h
|
float
|
Bandwidth (auto-selected if None). |
None
|
n_estimators
|
int
|
Number of boosting rounds. |
200
|
max_depth
|
int
|
Maximum tree depth per round. |
3
|
learning_rate
|
float
|
Shrinkage factor. |
0.1
|
alpha
|
float
|
Significance level. |
0.05
|
seed
|
int
|
Random seed. |
42
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
estimate = average CATE; detail = per-obs CATE and bootstrap SE. |
rd_lasso ¶
rd_lasso(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, h: Optional[float] = None, kernel: str = 'triangular', cv_folds: int = 5, alpha: float = 0.05) -> CausalResult
LASSO-assisted RD via post-double-selection.
Uses LASSO to select relevant covariates from a potentially large set, then runs a local linear RD with the selected covariates. Follows Belloni, Chernozhukov, and Hansen (2014).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable. |
required |
x
|
str
|
Running variable. |
required |
c
|
float
|
RD cutoff. |
0
|
covs
|
list of str
|
Candidate covariates (can be large set). |
None
|
h
|
float
|
Bandwidth (auto-selected if None). |
None
|
kernel
|
str
|
Kernel for local linear regression ('triangular', 'uniform', 'epanechnikov'). |
'triangular'
|
cv_folds
|
int
|
Cross-validation folds for LASSO penalty selection. |
5
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
estimate = RD treatment effect with LASSO-selected covariates; model_info includes selected_covariates and LASSO paths. |
rd_cate_summary ¶
rd_cate_summary(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, h: Optional[float] = None, methods: Optional[List[str]] = None, alpha: float = 0.05, seed: int = 42) -> dict
Run multiple ML-RD methods and compare CATE estimates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable. |
required |
x
|
str
|
Running variable. |
required |
c
|
float
|
RD cutoff. |
0
|
covs
|
list of str
|
Covariates for heterogeneity / selection. |
None
|
h
|
float
|
Bandwidth (shared across methods). |
None
|
methods
|
list of str
|
Subset of |
None
|
alpha
|
float
|
Significance level. |
0.05
|
seed
|
int
|
Random seed. |
42
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys = method names, values = CausalResult.
Additional key |
rd_extrapolate ¶
rd_extrapolate(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, treatment: Optional[str] = None, eval_points: Optional[ndarray] = None, n_eval: int = 20, method: str = 'ols', h_local: Optional[float] = None, alpha: float = 0.05) -> CausalResult
Angrist-Rokkanen (2015) extrapolation of RD effects away from the cutoff.
If Y(0), Y(1) are independent of X conditional on covariates Z, then treatment effects can be estimated at any value of the running variable, not just at the cutoff.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
RD cutoff value. |
0
|
covs
|
list of str
|
Covariate names for conditional independence. Required. |
None
|
treatment
|
str
|
Treatment variable for fuzzy RD. If None, sharp design assumed (D = 1{X >= c}). |
None
|
eval_points
|
ndarray
|
Running variable values at which to extrapolate the CATE.
If None, |
None
|
n_eval
|
int
|
Number of evaluation points when |
20
|
method
|
str
|
Estimation method: |
'ols'
|
h_local
|
float
|
Bandwidth for local RD estimate at the cutoff (for comparison).
If None, the MSE-optimal bandwidth from |
None
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Notes
The key identifying assumption is conditional independence:
Y(0), Y(1) ⊥ X | Z. A partial F-test is run on each side of the
cutoff to check whether X retains predictive power for Y after
conditioning on Z. A rejection suggests the assumption may fail.
References
Angrist, J.D. and Rokkanen, M. (2015). "Wanna Get Away? Regression Discontinuity Estimation of Exam School Effects Away from the Cutoff." JASA, 110(512), 1331-1344. [@angrist2015wanna]
Examples:
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> Z = rng.normal(0, 1, n)
>>> X = Z + rng.normal(0, 0.5, n)
>>> D = (X >= 0).astype(int)
>>> Y = 1.0 + 2.0 * Z + 3.0 * D + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({'y': Y, 'x': X, 'z': Z})
>>> result = rd_extrapolate(df, y='y', x='x', c=0, covs=['z'])
>>> abs(result.estimate - 3.0) < 1.0
True
rd_multi_extrapolate ¶
rd_multi_extrapolate(data: DataFrame, y: str, x: str, cutoffs: List[float], eval_points: Optional[ndarray] = None, method: str = 'linear', alpha: float = 0.05) -> CausalResult
Multi-cutoff RD extrapolation (Cattaneo, Keele, Titiunik, Vazquez-Bare 2021).
Estimates local RD effects at each cutoff, then interpolates/extrapolates a treatment effect function tau(x) through those point estimates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
cutoffs
|
list of float
|
Cutoff values. Must contain at least 2 cutoffs. |
required |
eval_points
|
ndarray
|
Running variable values at which to predict tau(x). Defaults to 30 equally spaced points spanning the data range. |
None
|
method
|
str
|
Interpolation method:
|
'linear'
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
References
Cattaneo, M.D., Keele, L., Titiunik, R. and Vazquez-Bare, G. (2021). "Extrapolating Treatment Effects in Multi-Cutoff Regression Discontinuity Designs." Journal of the American Statistical Association, 116(536), 1941-1952. doi:10.1080/01621459.2020.1751646 [@cattaneo2021extrapolating]
Examples:
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 3000
>>> X = rng.uniform(-2, 4, n)
>>> tau_true = 2.0 + 0.5 * X
>>> D = ((X >= 1) | (X >= 3)).astype(int)
>>> Y = 0.5 * X + tau_true * D + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({'y': Y, 'x': X})
>>> result = rd_multi_extrapolate(df, y='y', x='x', cutoffs=[1.0, 3.0])
>>> result.estimate > 0
True
rd_external_validity ¶
rd_external_validity(data: DataFrame, y: str, x: str, c: float = 0, covs: Optional[List[str]] = None, target_x_range: Optional[tuple] = None, alpha: float = 0.05) -> dict
Diagnostic assessment of RD external validity.
Compares covariate distributions between the cutoff neighborhood and a target population, runs the conditional independence test, and provides a recommendation on whether extrapolation is credible.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input dataset. |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
RD cutoff value. |
0
|
covs
|
list of str
|
Covariate names for overlap and CI testing. |
None
|
target_x_range
|
tuple of (float, float)
|
Running variable range |
None
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys:
|
Examples:
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 2000
>>> Z = rng.normal(0, 1, n)
>>> X = Z + rng.normal(0, 0.5, n)
>>> D = (X >= 0).astype(int)
>>> Y = 1.0 + 2.0 * Z + 3.0 * D + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({'y': Y, 'x': X, 'z': Z})
>>> diag = rd_external_validity(df, y='y', x='x', c=0, covs=['z'])
>>> 'recommendation' in diag
True
rd_interference ¶
rd_interference(data: DataFrame, y: str, running: str, neighbour_running: str, cutoff: float = 0.0, bandwidth: Optional[float] = None, kernel: str = 'triangular', alpha: float = 0.05) -> RDInterferenceResult
Sharp RDD with network interference (Cabrelli-Marconi 2024).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome. |
required |
running
|
str
|
Own running variable. |
required |
neighbour_running
|
str
|
Average running variable across neighbours (precomputed). |
required |
cutoff
|
float
|
|
0.0
|
bandwidth
|
float
|
Defaults to IQR of own running variable. |
None
|
kernel
|
str
|
|
'triangular'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
RDInterferenceResult
|
|
rd_multi_score ¶
rd_multi_score(data: DataFrame, y: str, running_vars: List[str], cutoffs: List[float], bandwidth: float = None, kernel: str = 'triangular', alpha: float = 0.05) -> MultiScoreRDResult
Multi-score RDD: treatment if all running variables exceed cutoffs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome. |
required |
running_vars
|
list of str
|
Multiple running variables. |
required |
cutoffs
|
list of float
|
One cutoff per running variable (same length). |
required |
bandwidth
|
float
|
Defaults to median IQR across running vars. |
None
|
kernel
|
str
|
|
'triangular'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
MultiScoreRDResult
|
|
rd_distribution ¶
rd_distribution(data: DataFrame, y: str, running: str, cutoff: float = 0.0, quantiles: Optional[ndarray] = None, bandwidth: Optional[float] = None, kernel: str = 'triangular', alpha: float = 0.05) -> DistRDResult
Distribution-valued sharp RDD.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
running
|
str
|
|
required |
cutoff
|
float
|
|
0.0
|
quantiles
|
array - like
|
Defaults to (0.1, 0.25, 0.5, 0.75, 0.9). |
None
|
bandwidth
|
float
|
|
None
|
kernel
|
str
|
|
'triangular'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
DistRDResult
|
|
rd_bayes_hte ¶
rd_bayes_hte(data: DataFrame, y: str, running: str, covariates: List[str], cutoff: float = 0.0, bandwidth: Optional[float] = None, kernel: str = 'triangular', alpha: float = 0.05, n_draws: int = 2000, seed: int = 0) -> BayesRDHTEResult
Bayesian RDD allowing CATE to depend on covariates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
running
|
str
|
|
required |
covariates
|
list of str
|
|
required |
cutoff
|
float
|
|
0.0
|
bandwidth
|
float
|
|
None
|
kernel
|
str
|
|
'triangular'
|
alpha
|
float
|
|
0.05
|
n_draws
|
int
|
|
2000
|
seed
|
int
|
|
0
|
Returns:
| Type | Description |
|---|---|
BayesRDHTEResult
|
|
rd_distributional_design ¶
rd_distributional_design(data: DataFrame, y: str, running: str, cutoff: float = 0.0, quantiles: Optional[ndarray] = None, bandwidth: Optional[float] = None, kernel: str = 'triangular') -> DDDResult
Joint RDD + RKD on the conditional distribution of Y.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
running
|
str
|
|
required |
cutoff
|
float
|
|
0.0
|
quantiles
|
array - like
|
|
None
|
bandwidth
|
float
|
|
None
|
kernel
|
str
|
|
'triangular'
|
rd_bias_aware_fuzzy ¶
rd_bias_aware_fuzzy(data: DataFrame, y: str, x: str, fuzzy: str, c: float = 0.0, M_y: Optional[float] = None, M_d: Optional[float] = None, h: Optional[float] = None, kernel: str = 'triangular', alpha: float = 0.05, cluster: Optional[str] = None, n_grid: int = 401) -> CausalResult
Bias-aware confidence interval for fuzzy RD (Noack & Rothe 2024).
Constructs an Anderson--Rubin-style CI for the Wald-ratio fuzzy RD parameter that takes worst-case smoothing bias of the numerator and denominator local-linear estimates into account. The CI is robust to weak first stages and avoids the power asymmetry documented in Kaliski-Keane-Neal (2025).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome column. |
required |
x
|
str
|
Running variable column. |
required |
fuzzy
|
str
|
Treatment indicator column for the fuzzy first stage. |
required |
c
|
float
|
RD cutoff. |
0.0
|
M_y
|
float
|
Smoothness bounds on |g_Y''| and |g_D''|. If |
None
|
M_d
|
float
|
Smoothness bounds on |g_Y''| and |g_D''|. If |
None
|
h
|
float
|
Bandwidth. If |
None
|
kernel
|
str
|
Kernel. |
``'triangular'``
|
alpha
|
float
|
Significance level. |
0.05
|
cluster
|
str
|
Cluster variable for variance estimation. |
None
|
n_grid
|
int
|
Grid resolution for the AR-style test inversion. |
401
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
Result with |
Notes
The bias-aware CI is the set of τ_0 for which the test statistic
T(τ_0) = (Δ̂_Y − τ_0 · Δ̂_D) / σ̂(τ_0)
has |T(τ_0)| ≤ cv(b(τ_0)) where b(τ_0) is the worst-case bias-
to-noise ratio of the numerator–denominator combination. The grid
inversion accommodates non-convex CIs that arise under weak first
stages.
Examples:
rd_dashboard ¶
rd_dashboard(data: DataFrame, y: str, x: str, c: float = 0.0, covs: Optional[List[str]] = None, fuzzy: Optional[str] = None, bw_grid: Optional[Sequence[float]] = None, h: Optional[float] = None, figsize: Tuple[float, float] = (12, 9), title: Optional[str] = None, save: Optional[str] = None)
Four-panel RD diagnostic dashboard.
Combines the four checks every RD analysis should report:
- RD plot (top-left): IMSE-binned scatter with polynomial fit.
- Density discontinuity (top-right): :func:
rdplotdensityoutput for a McCrary-style manipulation test. - Covariate balance (bottom-left): mean of each
covon each side of the cutoff with point ranges. - Bandwidth sensitivity (bottom-right): point estimate and CI across a grid of bandwidths.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
x
|
str
|
|
required |
c
|
float
|
Cutoff. |
0.0
|
covs
|
list of str
|
Covariates for the balance panel. If None, the panel shows the density's binomial near-cutoff test instead. |
None
|
fuzzy
|
str
|
Fuzzy treatment column (passed through to RD plot/sensitivity). |
None
|
bw_grid
|
sequence of float
|
Bandwidths to evaluate in the sensitivity panel. If None,
uses |
None
|
h
|
float
|
Reference bandwidth used for plotting and as the basis for
|
None
|
figsize
|
tuple
|
|
(12, 9)
|
title
|
str
|
Suptitle. |
None
|
save
|
str
|
If a path is given, also save the figure (extension determines format). |
None
|
Returns:
| Type | Description |
|---|---|
(fig, axes) where axes is a 2x2 numpy array.
|
|
rd_compare ¶
rd_compare(data: DataFrame, y: str, x: str, c: float = 0.0, methods: Sequence[str] = _DEFAULT_COMPARE_METHODS, fuzzy: Optional[str] = None, method_kwargs: Optional[Dict[str, Dict[str, Any]]] = None, alpha: float = 0.05) -> DataFrame
Compare multiple RD estimators on the same data.
Returns a tidy DataFrame: one row per method with point estimate, SE, p-value and CI. Useful for robustness tables across estimation families (local-polynomial vs honest vs local-randomisation vs flexible-covariate-adjusted).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
x
|
str
|
|
required |
c
|
float
|
|
0.0
|
methods
|
sequence of str
|
Method aliases recognised by the :data: |
_DEFAULT_COMPARE_METHODS
|
fuzzy
|
str
|
Fuzzy treatment column passed through to all methods that accept it. |
None
|
method_kwargs
|
dict of dict
|
Per-method extra kwargs, e.g. |
None
|
alpha
|
float
|
Confidence level forwarded to each estimator that accepts |
0.05
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
rd_robustness_table ¶
rd_robustness_table(data: DataFrame, y: str, x: str, c: float = 0.0, fuzzy: Optional[str] = None, kernels: Sequence[str] = ('triangular', 'epanechnikov', 'uniform'), bwselects: Sequence[str] = ('mserd', 'cerrd', 'msetwo'), polynomials: Sequence[int] = (1, 2), donuts: Sequence[float] = (0.0,), covs: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05) -> DataFrame
Sweep over (kernel, bwselect, polynomial, donut) and return a robustness table for reporting.
Each row is one specification. The DataFrame contains both
"Conventional" and "Robust" point estimates and CIs from
:func:rdrobust, plus the bandwidth used. Suitable for
direct .to_latex(...) / .to_excel(...) export, or feed
into :func:statspai.outreg2 for native multi-column tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
x
|
str
|
|
required |
c
|
float
|
|
0.0
|
fuzzy
|
optional, forwarded to rdrobust
|
|
None
|
covs
|
optional, forwarded to rdrobust
|
|
None
|
cluster
|
optional, forwarded to rdrobust
|
|
None
|
kernels
|
sequences
|
Specification grid. |
('triangular', 'epanechnikov', 'uniform')
|
bwselects
|
sequences
|
Specification grid. |
('triangular', 'epanechnikov', 'uniform')
|
polynomials
|
sequences
|
Specification grid. |
('triangular', 'epanechnikov', 'uniform')
|
donuts
|
sequences
|
Specification grid. |
('triangular', 'epanechnikov', 'uniform')
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
pd.DataFrame with one row per specification and columns:
|
kernel, bwselect, p, donut, h, b, estimate_conv, se_conv, ci_conv_lo, ci_conv_hi, pvalue_conv, estimate_rbc, se_rbc, ci_rbc_lo, ci_rbc_hi, pvalue_rbc, n_left, n_right, status |
multi_cutoff_rd ¶
User-friendly alias for :func:sp.rdmc (multi-cutoff RD).
See :func:statspai.rd.rdmulti.rdmc for full documentation.
geographic_rd ¶
User-friendly alias for :func:sp.rdms (multi-score RD).
Geographic RD is the most common multi-score special case: the running
score is a signed distance to a political boundary. Cattaneo et al.
(2024) dispatch it via rdms; this alias makes the intent explicit.
boundary_rd ¶
User-friendly alias for :func:sp.rd2d (boundary discontinuity design).
Cattaneo, Titiunik & Yu (2025) boundary discontinuity design for 2D running variables (lat/long, for example).
multi_score_rd ¶
User-friendly alias for :func:sp.rd_multi_score.
Multi-score RD when eligibility depends on more than one discontinuous rule (e.g. income AND age thresholds). Separate from boundary RDD in that the rules are axis-aligned.
fit ¶
fit(data: Any = None, y: Any = None, x: Any = None, c: Any = 0, *, method: str = 'rdrobust', **kwargs: Any)
Alias for :func:_rd_dispatch. See sp.rd.__doc__ for usage.