statspai.bounds¶
bounds ¶
Bounds and Partial Identification for causal effects.
When point identification is not possible (e.g., due to sample selection, non-compliance, or missing data), bounds provide informative intervals for the treatment effect.
Functions:
| Name | Description |
|---|---|
- **Lee Bounds** : Bounds on ATE under sample selection |
|
- **Manski Bounds** : Worst-case bounds with minimal assumptions |
|
- **Horowitz-Manski Bounds** : Tighter bounds conditioning on covariates |
|
- **IV Bounds** : Bounds under imperfect instruments |
|
- **Oster Delta** : Coefficient stability / identified set |
|
- **Selection Bounds** : Lee bounds with covariates |
|
- **Breakdown Frontier** : Assumption robustness frontier |
|
References
Lee, D. S. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects. RES, 76(3), 1071-1102. [@lee2009training]
Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects. AER P&P, 80(2), 319-323. [@manski1990nonparametric]
Horowitz, J. L. & Manski, C. F. (2000). Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data. JASA, 95(449), 77-84. [@horowitz2000nonparametric]
Nevo, A. & Rosen, A. M. (2012). Identification with Imperfect Instruments. RES, 79(3), 1104-1127. [@nevo2012identification]
Oster, E. (2019). Unobservable Selection and Coefficient Stability. JBES, 37(2), 187-204. [@oster2019unobservable]
Masten, M. A. & Poirier, A. (2021). Salvaging Falsified Instrumental Variable Models. Econometrica, 89(3), 1449-1469. [@masten2021salvaging]
BoundsResult
dataclass
¶
Result container for partial identification / bounds estimation.
Attributes:
| Name | Type | Description |
|---|---|---|
lower |
float
|
Lower bound of the identified set. |
upper |
float
|
Upper bound of the identified set. |
se_lower |
float
|
Standard error of the lower bound. |
se_upper |
float
|
Standard error of the upper bound. |
ci_lower |
tuple
|
Confidence interval for the lower bound (lower_lo, lower_hi). |
ci_upper |
tuple
|
Confidence interval for the upper bound (upper_lo, upper_hi). |
method |
str
|
Name of the bounding method. |
alpha |
float
|
Significance level used for confidence intervals. |
n_obs |
int
|
Number of observations used. |
model_info |
dict
|
Additional method-specific information. |
plot ¶
Interval plot showing the identified set with confidence intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
matplotlib Axes
|
Axes to draw on; created if None. |
None
|
**kwargs
|
Passed to |
{}
|
Returns:
| Type | Description |
|---|---|
Axes
|
|
MLBoundsResult
dataclass
¶
ATE bounds produced by :func:ml_bounds.
center_shift ¶
Midpoint shift vs the classical Manski interval.
Under bounded-outcome Manski the width of the identification region is (y_max - y_min) by construction — ML cannot shrink it without an additional structural assumption. What ML does shift is the midpoint, because it plugs in a covariate-aware estimate of the identifiable side instead of a marginal mean. This method reports that midpoint shift.
lee_bounds ¶
lee_bounds(data: DataFrame, y: str, treat: str, selection: str, covariates: Optional[List[str]] = None, n_bootstrap: int = 500, alpha: float = 0.05, random_state: int = 42) -> CausalResult
Compute Lee (2009) bounds for ATE under sample selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data (including units with missing outcomes). |
required |
y
|
str
|
Outcome variable (may have NaN for selected-out units). |
required |
treat
|
str
|
Binary treatment variable (0/1). |
required |
selection
|
str
|
Binary selection/retention indicator (1 = observed, 0 = missing). |
required |
covariates
|
list of str
|
Not used in basic Lee bounds, reserved for conditional bounds. |
None
|
n_bootstrap
|
int
|
Bootstrap iterations for inference. |
500
|
alpha
|
float
|
Significance level. |
0.05
|
random_state
|
int
|
|
42
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
estimate = midpoint of bounds. ci = Imbens-Manski confidence interval for the identified set. model_info contains 'lower_bound' and 'upper_bound'. |
Examples:
manski_bounds ¶
manski_bounds(data: DataFrame, y: str, treat: str, y_lower: Optional[float] = None, y_upper: Optional[float] = None, assumption: str = 'none', alpha: float = 0.05, n_bootstrap: int = 500, random_state: int = 42) -> CausalResult
Compute Manski (1990) worst-case bounds on ATE.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data. |
required |
y
|
str
|
Outcome variable. |
required |
treat
|
str
|
Binary treatment variable (0/1). |
required |
y_lower
|
float
|
Known lower bound of the outcome. If None, uses observed min. |
None
|
y_upper
|
float
|
Known upper bound of the outcome. If None, uses observed max. |
None
|
assumption
|
str
|
Additional assumption: - 'none': no assumptions (widest bounds) - 'mtr': Monotone Treatment Response (Y(1) >= Y(0) for all) - 'mts': Monotone Treatment Selection (selection on levels) |
'none'
|
alpha
|
float
|
|
0.05
|
n_bootstrap
|
int
|
|
500
|
random_state
|
int
|
|
42
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
horowitz_manski ¶
horowitz_manski(data: DataFrame, y: str, treatment: str, covariates: List[str], y_lower: Optional[float] = None, y_upper: Optional[float] = None, n_boot: int = 500, alpha: float = 0.05, random_state: int = 42) -> BoundsResult
Horowitz-Manski (2000) bounds conditioning on covariates.
Tighter than unconditional Manski bounds by averaging conditional bounds over the covariate distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data. |
required |
y
|
str
|
Outcome variable. |
required |
treatment
|
str
|
Binary treatment variable (0/1). |
required |
covariates
|
list of str
|
Covariates to condition on (discretised via quartiles for continuous variables). |
required |
y_lower
|
float
|
Known lower bound of Y. Defaults to observed min. |
None
|
y_upper
|
float
|
Known upper bound of Y. Defaults to observed max. |
None
|
n_boot
|
int
|
Bootstrap replications. |
500
|
alpha
|
float
|
|
0.05
|
random_state
|
int
|
|
42
|
Returns:
| Type | Description |
|---|---|
BoundsResult
|
|
Examples:
iv_bounds ¶
iv_bounds(data: DataFrame, y: str, treatment: str, instrument: str, controls: Optional[List[str]] = None, assumption: str = 'monotone_iv', alpha: float = 0.05, n_boot: int = 500, random_state: int = 42) -> BoundsResult
Nevo-Rosen (2012) bounds for LATE under imperfect instruments.
When the exclusion restriction may be violated, the LATE is no longer point-identified. Under weaker monotonicity assumptions on the direction of the violation, informative bounds can still be obtained.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
treatment
|
str
|
Endogenous treatment (binary). |
required |
instrument
|
str
|
Instrument variable (binary). |
required |
controls
|
list of str
|
Control variables (residualized out via OLS). |
None
|
assumption
|
str
|
|
'monotone_iv'
|
alpha
|
float
|
|
0.05
|
n_boot
|
int
|
|
500
|
random_state
|
int
|
|
42
|
Returns:
| Type | Description |
|---|---|
BoundsResult
|
|
Examples:
oster_delta ¶
oster_delta(data: DataFrame, y: str, x_base: List[str], x_controls: List[str], r_max: float = 1.3, delta_range: Tuple[float, float] = (-2.0, 2.0), n_grid: int = 200, alpha: float = 0.05, n_boot: int = 500, random_state: int = 42) -> BoundsResult
Oster (2019) coefficient stability bounds and delta* computation.
Computes the identified set for the treatment coefficient beta under
the assumption that selection on unobservables is proportional to
selection on observables (delta), and R-squared would be at most
r_max if all unobservables were included.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
x_base
|
list of str
|
Key treatment/variable(s) of interest. |
required |
x_controls
|
list of str
|
Additional controls whose inclusion tightens identification. |
required |
r_max
|
float
|
Maximum R-squared assumption. Oster recommends 1.3 * R-squared from the fully controlled regression. If <= 0, it is set to 1.3 * R_full automatically. |
1.3
|
delta_range
|
tuple
|
Range of proportional selection parameter delta. |
(-2, 2)
|
n_grid
|
int
|
Grid points for delta in the identified set computation. |
200
|
alpha
|
float
|
|
0.05
|
n_boot
|
int
|
|
500
|
random_state
|
int
|
|
42
|
Returns:
| Type | Description |
|---|---|
BoundsResult
|
lower/upper give the identified set for beta at delta=1 (equal selection) and the given r_max. |
Examples:
selection_bounds ¶
selection_bounds(data: DataFrame, y: str, treatment: str, selection: str, covariates: Optional[List[str]] = None, method: str = 'conditional', n_boot: int = 500, alpha: float = 0.05, random_state: int = 42) -> BoundsResult
Lee (2009) bounds for ATE under sample selection, optionally conditioning on covariates for tighter bounds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable (may have NaN when selection=0). |
required |
treatment
|
str
|
Binary treatment (0/1). |
required |
selection
|
str
|
Binary indicator: 1 = outcome observed, 0 = missing. |
required |
covariates
|
list of str
|
Covariates to condition on for tighter (conditional) bounds. |
None
|
method
|
str
|
|
'conditional'
|
n_boot
|
int
|
|
500
|
alpha
|
float
|
|
0.05
|
random_state
|
int
|
|
42
|
Returns:
| Type | Description |
|---|---|
BoundsResult
|
|
Examples:
breakdown_frontier ¶
breakdown_frontier(estimate: float, se: float, assumption: str = 'parallel_trends', max_violation: float = 0.1, n_grid: int = 100, alpha: float = 0.05) -> BoundsResult
Masten-Poirier (2021) breakdown frontier for qualitative conclusions.
For a given point estimate and standard error, computes how much an identifying assumption can be violated before a qualitative conclusion (e.g., "positive treatment effect") breaks down.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimate
|
float
|
Point estimate of the treatment effect. |
required |
se
|
float
|
Standard error of the estimate. |
required |
assumption
|
str
|
Label for the identifying assumption being relaxed. Currently
supports a generic linear violation model applicable to
|
'parallel_trends'
|
max_violation
|
float
|
Maximum magnitude of the assumption violation to explore. |
0.1
|
n_grid
|
int
|
Grid resolution for the frontier. |
100
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
BoundsResult
|
lower/upper give the identified set at the maximum violation.
|
Examples: