Skip to content

statspai.bounds

bounds

Bounds and Partial Identification for causal effects.

When point identification is not possible (e.g., due to sample selection, non-compliance, or missing data), bounds provide informative intervals for the treatment effect.

Functions:

Name Description
- **Lee Bounds** : Bounds on ATE under sample selection
- **Manski Bounds** : Worst-case bounds with minimal assumptions
- **Horowitz-Manski Bounds** : Tighter bounds conditioning on covariates
- **IV Bounds** : Bounds under imperfect instruments
- **Oster Delta** : Coefficient stability / identified set
- **Selection Bounds** : Lee bounds with covariates
- **Breakdown Frontier** : Assumption robustness frontier
References

Lee, D. S. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects. RES, 76(3), 1071-1102. [@lee2009training]

Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects. AER P&P, 80(2), 319-323. [@manski1990nonparametric]

Horowitz, J. L. & Manski, C. F. (2000). Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data. JASA, 95(449), 77-84. [@horowitz2000nonparametric]

Nevo, A. & Rosen, A. M. (2012). Identification with Imperfect Instruments. RES, 79(3), 1104-1127. [@nevo2012identification]

Oster, E. (2019). Unobservable Selection and Coefficient Stability. JBES, 37(2), 187-204. [@oster2019unobservable]

Masten, M. A. & Poirier, A. (2021). Salvaging Falsified Instrumental Variable Models. Econometrica, 89(3), 1449-1469. [@masten2021salvaging]

BoundsResult dataclass

Result container for partial identification / bounds estimation.

Attributes:

Name Type Description
lower float

Lower bound of the identified set.

upper float

Upper bound of the identified set.

se_lower float

Standard error of the lower bound.

se_upper float

Standard error of the upper bound.

ci_lower tuple

Confidence interval for the lower bound (lower_lo, lower_hi).

ci_upper tuple

Confidence interval for the upper bound (upper_lo, upper_hi).

method str

Name of the bounding method.

alpha float

Significance level used for confidence intervals.

n_obs int

Number of observations used.

model_info dict

Additional method-specific information.

width property

width: float

Width of the identified set.

midpoint property

midpoint: float

Midpoint of the identified set.

includes_zero

includes_zero() -> bool

Check whether the identified set includes zero.

summary

summary() -> str

Return a formatted summary string.

plot

plot(ax=None, **kwargs)

Interval plot showing the identified set with confidence intervals.

Parameters:

Name Type Description Default
ax matplotlib Axes

Axes to draw on; created if None.

None
**kwargs

Passed to ax.errorbar.

{}

Returns:

Type Description
Axes

MLBoundsResult dataclass

ATE bounds produced by :func:ml_bounds.

center_shift

center_shift() -> float

Midpoint shift vs the classical Manski interval.

Under bounded-outcome Manski the width of the identification region is (y_max - y_min) by construction — ML cannot shrink it without an additional structural assumption. What ML does shift is the midpoint, because it plugs in a covariate-aware estimate of the identifiable side instead of a marginal mean. This method reports that midpoint shift.

lee_bounds

lee_bounds(data: DataFrame, y: str, treat: str, selection: str, covariates: Optional[List[str]] = None, n_bootstrap: int = 500, alpha: float = 0.05, random_state: int = 42) -> CausalResult

Compute Lee (2009) bounds for ATE under sample selection.

Parameters:

Name Type Description Default
data DataFrame

Input data (including units with missing outcomes).

required
y str

Outcome variable (may have NaN for selected-out units).

required
treat str

Binary treatment variable (0/1).

required
selection str

Binary selection/retention indicator (1 = observed, 0 = missing).

required
covariates list of str

Not used in basic Lee bounds, reserved for conditional bounds.

None
n_bootstrap int

Bootstrap iterations for inference.

500
alpha float

Significance level.

0.05
random_state int
42

Returns:

Type Description
CausalResult

estimate = midpoint of bounds. ci = Imbens-Manski confidence interval for the identified set. model_info contains 'lower_bound' and 'upper_bound'.

Examples:

>>> import statspai as sp
>>> result = sp.lee_bounds(df, y='wage', treat='training',
...                        selection='employed')
>>> print(f"Bounds: [{result.model_info['lower_bound']:.3f}, "
...       f"{result.model_info['upper_bound']:.3f}]")

manski_bounds

manski_bounds(data: DataFrame, y: str, treat: str, y_lower: Optional[float] = None, y_upper: Optional[float] = None, assumption: str = 'none', alpha: float = 0.05, n_bootstrap: int = 500, random_state: int = 42) -> CausalResult

Compute Manski (1990) worst-case bounds on ATE.

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
y str

Outcome variable.

required
treat str

Binary treatment variable (0/1).

required
y_lower float

Known lower bound of the outcome. If None, uses observed min.

None
y_upper float

Known upper bound of the outcome. If None, uses observed max.

None
assumption str

Additional assumption: - 'none': no assumptions (widest bounds) - 'mtr': Monotone Treatment Response (Y(1) >= Y(0) for all) - 'mts': Monotone Treatment Selection (selection on levels)

'none'
alpha float
0.05
n_bootstrap int
500
random_state int
42

Returns:

Type Description
CausalResult

Examples:

>>> import statspai as sp
>>> result = sp.manski_bounds(df, y='employed', treat='training',
...                           y_lower=0, y_upper=1)
>>> print(f"Bounds: [{result.model_info['lower_bound']:.3f}, "
...       f"{result.model_info['upper_bound']:.3f}]")

horowitz_manski

horowitz_manski(data: DataFrame, y: str, treatment: str, covariates: List[str], y_lower: Optional[float] = None, y_upper: Optional[float] = None, n_boot: int = 500, alpha: float = 0.05, random_state: int = 42) -> BoundsResult

Horowitz-Manski (2000) bounds conditioning on covariates.

Tighter than unconditional Manski bounds by averaging conditional bounds over the covariate distribution.

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
y str

Outcome variable.

required
treatment str

Binary treatment variable (0/1).

required
covariates list of str

Covariates to condition on (discretised via quartiles for continuous variables).

required
y_lower float

Known lower bound of Y. Defaults to observed min.

None
y_upper float

Known upper bound of Y. Defaults to observed max.

None
n_boot int

Bootstrap replications.

500
alpha float
0.05
random_state int
42

Returns:

Type Description
BoundsResult

Examples:

>>> import statspai as sp
>>> result = sp.horowitz_manski(
...     data=df, y="wage", treatment="trained",
...     covariates=["age", "education"],
...     y_lower=0, y_upper=100,
... )
>>> result.summary()

iv_bounds

iv_bounds(data: DataFrame, y: str, treatment: str, instrument: str, controls: Optional[List[str]] = None, assumption: str = 'monotone_iv', alpha: float = 0.05, n_boot: int = 500, random_state: int = 42) -> BoundsResult

Nevo-Rosen (2012) bounds for LATE under imperfect instruments.

When the exclusion restriction may be violated, the LATE is no longer point-identified. Under weaker monotonicity assumptions on the direction of the violation, informative bounds can still be obtained.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
treatment str

Endogenous treatment (binary).

required
instrument str

Instrument variable (binary).

required
controls list of str

Control variables (residualized out via OLS).

None
assumption str
  • 'monotone_iv': instrument has same-sign direct effect as through the treatment (Nevo-Rosen Proposition 2).
  • 'less_than_late': direct effect of Z on Y is weakly less than the indirect effect (tighter).
'monotone_iv'
alpha float
0.05
n_boot int
500
random_state int
42

Returns:

Type Description
BoundsResult

Examples:

>>> import statspai as sp
>>> result = sp.iv_bounds(
...     data=df, y="wage", treatment="trained",
...     instrument="lottery", assumption="monotone_iv",
... )
>>> result.summary()

oster_delta

oster_delta(data: DataFrame, y: str, x_base: List[str], x_controls: List[str], r_max: float = 1.3, delta_range: Tuple[float, float] = (-2.0, 2.0), n_grid: int = 200, alpha: float = 0.05, n_boot: int = 500, random_state: int = 42) -> BoundsResult

Oster (2019) coefficient stability bounds and delta* computation.

Computes the identified set for the treatment coefficient beta under the assumption that selection on unobservables is proportional to selection on observables (delta), and R-squared would be at most r_max if all unobservables were included.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
x_base list of str

Key treatment/variable(s) of interest.

required
x_controls list of str

Additional controls whose inclusion tightens identification.

required
r_max float

Maximum R-squared assumption. Oster recommends 1.3 * R-squared from the fully controlled regression. If <= 0, it is set to 1.3 * R_full automatically.

1.3
delta_range tuple

Range of proportional selection parameter delta.

(-2, 2)
n_grid int

Grid points for delta in the identified set computation.

200
alpha float
0.05
n_boot int
500
random_state int
42

Returns:

Type Description
BoundsResult

lower/upper give the identified set for beta at delta=1 (equal selection) and the given r_max.

Examples:

>>> import statspai as sp
>>> result = sp.oster_delta(
...     data=df, y="wage",
...     x_base=["education"],
...     x_controls=["experience", "tenure"],
...     r_max=1.3,
... )
>>> result.summary()
>>> result.plot()

selection_bounds

selection_bounds(data: DataFrame, y: str, treatment: str, selection: str, covariates: Optional[List[str]] = None, method: str = 'conditional', n_boot: int = 500, alpha: float = 0.05, random_state: int = 42) -> BoundsResult

Lee (2009) bounds for ATE under sample selection, optionally conditioning on covariates for tighter bounds.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable (may have NaN when selection=0).

required
treatment str

Binary treatment (0/1).

required
selection str

Binary indicator: 1 = outcome observed, 0 = missing.

required
covariates list of str

Covariates to condition on for tighter (conditional) bounds.

None
method str
  • 'conditional': compute Lee bounds within covariate strata and average (tighter).
  • 'unconditional': standard Lee bounds ignoring covariates.
'conditional'
n_boot int
500
alpha float
0.05
random_state int
42

Returns:

Type Description
BoundsResult

Examples:

>>> import statspai as sp
>>> result = sp.selection_bounds(
...     data=df, y="wage", treatment="trained",
...     selection="employed",
...     covariates=["age", "education"],
...     method="conditional",
... )
>>> result.summary()

breakdown_frontier

breakdown_frontier(estimate: float, se: float, assumption: str = 'parallel_trends', max_violation: float = 0.1, n_grid: int = 100, alpha: float = 0.05) -> BoundsResult

Masten-Poirier (2021) breakdown frontier for qualitative conclusions.

For a given point estimate and standard error, computes how much an identifying assumption can be violated before a qualitative conclusion (e.g., "positive treatment effect") breaks down.

Parameters:

Name Type Description Default
estimate float

Point estimate of the treatment effect.

required
se float

Standard error of the estimate.

required
assumption str

Label for the identifying assumption being relaxed. Currently supports a generic linear violation model applicable to 'parallel_trends', 'exclusion_restriction', or 'selection_on_observables'.

'parallel_trends'
max_violation float

Maximum magnitude of the assumption violation to explore.

0.1
n_grid int

Grid resolution for the frontier.

100
alpha float
0.05

Returns:

Type Description
BoundsResult

lower/upper give the identified set at the maximum violation. model_info contains 'breakdown_point' (the violation magnitude at which the conclusion reverses) and 'frontier_grid' / 'frontier_bounds' for plotting.

Examples:

>>> import statspai as sp
>>> result = sp.breakdown_frontier(
...     estimate=0.05, se=0.02,
...     assumption="parallel_trends",
...     max_violation=0.1,
... )
>>> result.summary()
>>> result.plot()