`statspai.bounds`¶

bounds ¶

Bounds and Partial Identification for causal effects.

When point identification is not possible (e.g., due to sample selection, non-compliance, or missing data), bounds provide informative intervals for the treatment effect.

Functions:

Name	Description
`- Lee Bounds : Bounds on ATE under sample selection`
`- Manski Bounds : Worst-case bounds with minimal assumptions`
`- Horowitz-Manski Bounds : Tighter bounds conditioning on covariates`
`- IV Bounds : Bounds under imperfect instruments`
`- Oster Delta : Coefficient stability / identified set`
`- Selection Bounds : Lee bounds with covariates`
`- Breakdown Frontier : Assumption robustness frontier`

References

Lee, D. S. (2009). Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects. RES, 76(3), 1071-1102. [@lee2009training]

Manski, C. F. (1990). Nonparametric Bounds on Treatment Effects. AER P&P, 80(2), 319-323. [@manski1990nonparametric]

Horowitz, J. L. & Manski, C. F. (2000). Nonparametric Analysis of Randomized Experiments with Missing Covariate and Outcome Data. JASA, 95(449), 77-84. [@horowitz2000nonparametric]

Nevo, A. & Rosen, A. M. (2012). Identification with Imperfect Instruments. RES, 79(3), 1104-1127. [@nevo2012identification]

Oster, E. (2019). Unobservable Selection and Coefficient Stability. JBES, 37(2), 187-204. [@oster2019unobservable]

Masten, M. A. & Poirier, A. (2021). Salvaging Falsified Instrumental Variable Models. Econometrica, 89(3), 1449-1469. [@masten2021salvaging]

BoundsResult `dataclass` ¶

Bases: ResultProtocolMixin

Result container for partial identification / bounds estimation.

Attributes:

Name	Type	Description
`lower`	`float`	Lower bound of the identified set.
`upper`	`float`	Upper bound of the identified set.
`se_lower`	`float`	Standard error of the lower bound.
`se_upper`	`float`	Standard error of the upper bound.
`ci_lower`	`tuple`	Confidence interval for the lower bound (lower_lo, lower_hi).
`ci_upper`	`tuple`	Confidence interval for the upper bound (upper_lo, upper_hi).
`method`	`str`	Name of the bounding method.
`alpha`	`float`	Significance level used for confidence intervals.
`n_obs`	`int`	Number of observations used.
`model_info`	`dict`	Additional method-specific information.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> age = rng.integers(20, 60, n).astype(float)
>>> education = rng.integers(8, 18, n).astype(float)
>>> trained = rng.integers(0, 2, n)
>>> wage = 10 + 0.3 * age + 0.5 * education + 5 * trained + rng.normal(0, 3, n)
>>> df = pd.DataFrame(
...     {"wage": wage, "trained": trained,
...      "age": age, "education": education})
>>> res = sp.horowitz_manski(
...     data=df, y="wage", treatment="trained",
...     covariates=["age", "education"], n_boot=50, random_state=0)
>>> isinstance(res, sp.BoundsResult)
True
>>> bool(res.lower <= res.upper)
True
>>> bool(res.width >= 0)
True

width `property` ¶

width: float

Width of the identified set.

midpoint `property` ¶

midpoint: float

Midpoint of the identified set.

includes_zero ¶

includes_zero() -> bool

Check whether the identified set includes zero.

summary ¶

summary() -> str

Return a formatted summary string.

plot ¶

plot(ax: Any = None, **kwargs: Any) -> Any

Interval plot showing the identified set with confidence intervals.

Parameters:

Name	Type	Description	Default
`ax`	`matplotlib Axes`	Axes to draw on; created if None.	`None`
`**kwargs`	`Any`	Passed to `ax.errorbar`.	`{}`

Returns:

Type	Description
`Axes`

MLBoundsResult `dataclass` ¶

Bases: ResultProtocolMixin

ATE bounds produced by :func:ml_bounds.

center_shift ¶

center_shift() -> float

Midpoint shift vs the classical Manski interval.

Under bounded-outcome Manski the width of the identification region is (y_max - y_min) by construction — ML cannot shrink it without an additional structural assumption. What ML does shift is the midpoint, because it plugs in a covariate-aware estimate of the identifiable side instead of a marginal mean. This method reports that midpoint shift.

lee_bounds ¶

lee_bounds(data: DataFrame, y: str, treat: str, selection: str, covariates: Optional[List[str]] = None, n_bootstrap: int = 500, alpha: float = 0.05, random_state: int = 42) -> CausalResult

Compute Lee (2009) bounds for ATE under sample selection.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data (including units with missing outcomes).	required
`y`	`str`	Outcome variable (may have NaN for selected-out units).	required
`treat`	`str`	Binary treatment variable (0/1).	required
`selection`	`str`	Binary selection/retention indicator (1 = observed, 0 = missing).	required
`covariates`	`list of str`	Not used in basic Lee bounds, reserved for conditional bounds.	`None`
`n_bootstrap`	`int`	Bootstrap iterations for inference.	`500`
`alpha`	`float`	Significance level.	`0.05`
`random_state`	`int`		`42`

Returns:

Type	Description
`CausalResult`	estimate = midpoint of bounds. ci = Imbens-Manski confidence interval for the identified set. model_info contains 'lower_bound' and 'upper_bound'.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(1)
>>> n = 600
>>> training = rng.integers(0, 2, size=n)
>>> # employment (selection) is higher among the treated -> differential attrition
>>> employed = (rng.uniform(size=n)
...             < np.where(training == 1, 0.8, 0.6)).astype(int)
>>> wage = 10.0 + 2.0 * training + rng.normal(size=n)
>>> wage = np.where(employed == 1, wage, np.nan)   # wage missing if not employed
>>> df = pd.DataFrame({'wage': wage, 'training': training, 'employed': employed})
>>> result = sp.lee_bounds(df, y='wage', treat='training',
...                        selection='employed', n_bootstrap=100)
>>> bool(result.model_info['lower_bound'] <= result.model_info['upper_bound'])
True

manski_bounds ¶

manski_bounds(data: DataFrame, y: str, treat: str, y_lower: Optional[float] = None, y_upper: Optional[float] = None, assumption: str = 'none', alpha: float = 0.05, n_bootstrap: int = 500, random_state: int = 42) -> CausalResult

Compute Manski (1990) worst-case bounds on ATE.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data.	required
`y`	`str`	Outcome variable.	required
`treat`	`str`	Binary treatment variable (0/1).	required
`y_lower`	`float`	Known lower bound of the outcome. If None, uses observed min.	`None`
`y_upper`	`float`	Known upper bound of the outcome. If None, uses observed max.	`None`
`assumption`	`str`	Additional assumption: - 'none': no assumptions (widest bounds) - 'mtr': Monotone Treatment Response (Y(1) >= Y(0) for all) - 'mts': Monotone Treatment Selection (selection on levels)	`'none'`
`alpha`	`float`		`0.05`
`n_bootstrap`	`int`		`500`
`random_state`	`int`		`42`

Returns:

Type	Description
`CausalResult`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(2)
>>> n = 600
>>> training = rng.integers(0, 2, size=n)
>>> employed = (rng.uniform(size=n)
...            < np.where(training == 1, 0.7, 0.5)).astype(int)
>>> df = pd.DataFrame({'employed': employed, 'training': training})
>>> result = sp.manski_bounds(df, y='employed', treat='training',
...                           y_lower=0, y_upper=1, n_bootstrap=100)
>>> bool(result.model_info['lower_bound'] <= result.model_info['upper_bound'])
True

horowitz_manski ¶

horowitz_manski(data: DataFrame, y: str, treatment: str, covariates: List[str], y_lower: Optional[float] = None, y_upper: Optional[float] = None, n_boot: int = 500, alpha: float = 0.05, random_state: int = 42) -> BoundsResult

Horowitz-Manski (2000) bounds conditioning on covariates.

Tighter than unconditional Manski bounds by averaging conditional bounds over the covariate distribution.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data.	required
`y`	`str`	Outcome variable.	required
`treatment`	`str`	Binary treatment variable (0/1).	required
`covariates`	`list of str`	Covariates to condition on (discretised via quartiles for continuous variables).	required
`y_lower`	`float`	Known lower bound of Y. Defaults to observed min.	`None`
`y_upper`	`float`	Known upper bound of Y. Defaults to observed max.	`None`
`n_boot`	`int`	Bootstrap replications.	`500`
`alpha`	`float`		`0.05`
`random_state`	`int`		`42`

Returns:

Type	Description
`BoundsResult`

Notes

Bootstrap replicates that fail are dropped from the SE/CI computation; a RuntimeWarning reports the failed fraction and model_info['n_boot_failed'] records the count.

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> trained = rng.binomial(1, 0.5, n)
>>> wage = 50 + 10 * trained + rng.normal(0, 15, n)
>>> df = pd.DataFrame({"wage": wage, "trained": trained,
...                    "age": rng.normal(40, 10, n)})
>>> result = sp.horowitz_manski(
...     data=df, y="wage", treatment="trained", covariates=["age"],
...     y_lower=float(df["wage"].min()), y_upper=float(df["wage"].max()),
... )
>>> result.summary()

iv_bounds ¶

iv_bounds(data: DataFrame, y: str, treatment: str, instrument: str, controls: Optional[List[str]] = None, assumption: str = 'monotone_iv', alpha: float = 0.05, n_boot: int = 500, random_state: int = 42) -> BoundsResult

Nevo-Rosen (2012) bounds for LATE under imperfect instruments.

When the exclusion restriction may be violated, the LATE is no longer point-identified. Under weaker monotonicity assumptions on the direction of the violation, informative bounds can still be obtained.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`	Outcome variable.	required
`treatment`	`str`	Endogenous treatment (binary).	required
`instrument`	`str`	Instrument variable (binary).	required
`controls`	`list of str`	Control variables (residualized out via OLS).	`None`
`assumption`	`str`	`'monotone_iv'`: instrument has same-sign direct effect as through the treatment (Nevo-Rosen Proposition 2). `'less_than_late'`: direct effect of Z on Y is weakly less than the indirect effect (tighter).	`'monotone_iv'`
`alpha`	`float`		`0.05`
`n_boot`	`int`		`500`
`random_state`	`int`		`42`

Returns:

Type	Description
`BoundsResult`

Notes

Bootstrap replicates that fail are dropped from the SE/CI computation; a RuntimeWarning reports the failed fraction and model_info['n_boot_failed'] records the count.

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 500
>>> lottery = rng.binomial(1, 0.5, n)
>>> trained = ((lottery + rng.normal(0, 1, n)) > 0.5).astype(int)
>>> employed = ((0.3 * trained + rng.normal(0, 1, n)) > 0).astype(int)
>>> df = pd.DataFrame({"employed": employed, "trained": trained,
...                    "lottery": lottery})
>>> result = sp.iv_bounds(
...     data=df, y="employed", treatment="trained",
...     instrument="lottery", assumption="monotone_iv",
... )
>>> result.summary()

oster_delta ¶

oster_delta(data: DataFrame, y: str, x_base: List[str], x_controls: List[str], r_max: float = 1.3, delta_range: Tuple[float, float] = (-2.0, 2.0), n_grid: int = 200, alpha: float = 0.05, n_boot: int = 500, random_state: int = 42) -> BoundsResult

Oster (2019) coefficient stability bounds and delta* computation.

Computes the identified set for the treatment coefficient beta under the assumption that selection on unobservables is proportional to selection on observables (delta), and R-squared would be at most r_max if all unobservables were included.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`	Outcome variable.	required
`x_base`	`list of str`	Key treatment/variable(s) of interest.	required
`x_controls`	`list of str`	Additional controls whose inclusion tightens identification.	required
`r_max`	`float`	Maximum R-squared assumption. Oster recommends 1.3 * R-squared from the fully controlled regression. If <= 0, it is set to 1.3 * R_full automatically.	`1.3`
`delta_range`	`tuple`	Range of proportional selection parameter delta.	`(-2, 2)`
`n_grid`	`int`	Grid points for delta in the identified set computation.	`200`
`alpha`	`float`		`0.05`
`n_boot`	`int`		`500`
`random_state`	`int`		`42`

Returns:

Type	Description
`BoundsResult`	lower/upper give the identified set for beta at delta=1 (equal selection) and the given r_max.

Notes

Bootstrap replicates that fail are dropped from the SE/CI computation; a RuntimeWarning reports the failed fraction and model_info['n_boot_failed'] records the count.

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 500
>>> education = rng.normal(12, 3, n)
>>> experience = rng.normal(10, 5, n)
>>> tenure = rng.normal(5, 3, n)
>>> wage = (2 * education + 1.5 * experience + 0.8 * tenure
...         + rng.normal(0, 5, n))
>>> df = pd.DataFrame({"wage": wage, "education": education,
...                    "experience": experience, "tenure": tenure})
>>> result = sp.oster_delta(
...     data=df, y="wage",
...     x_base=["education"],
...     x_controls=["experience", "tenure"],
...     r_max=1.3,
... )
>>> result.summary()

selection_bounds ¶

selection_bounds(data: DataFrame, y: str, treatment: str, selection: str, covariates: Optional[List[str]] = None, method: str = 'conditional', n_boot: int = 500, alpha: float = 0.05, random_state: int = 42) -> BoundsResult

Lee (2009) bounds for ATE under sample selection, optionally conditioning on covariates for tighter bounds.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`	Outcome variable (may have NaN when selection=0).	required
`treatment`	`str`	Binary treatment (0/1).	required
`selection`	`str`	Binary indicator: 1 = outcome observed, 0 = missing.	required
`covariates`	`list of str`	Covariates to condition on for tighter (conditional) bounds.	`None`
`method`	`str`	`'conditional'`: compute Lee bounds within covariate strata and average (tighter). `'unconditional'`: standard Lee bounds ignoring covariates.	`'conditional'`
`n_boot`	`int`		`500`
`alpha`	`float`		`0.05`
`random_state`	`int`		`42`

Returns:

Type	Description
`BoundsResult`

Notes

Bootstrap replicates that fail (or yield NaN bounds, e.g. a stratum losing all treated or selected units in a resample) are dropped from the SE/CI computation; a RuntimeWarning reports the failed fraction and model_info['n_boot_failed'] records the count.

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 500
>>> trained = rng.binomial(1, 0.5, n)
>>> employed = rng.binomial(1, 0.8, n)
>>> wage = 50 + 10 * trained + rng.normal(0, 10, n)
>>> df = pd.DataFrame({"wage": np.where(employed == 1, wage, np.nan),
...                    "trained": trained, "employed": employed,
...                    "age": rng.normal(40, 10, n)})
>>> result = sp.selection_bounds(
...     data=df, y="wage", treatment="trained",
...     selection="employed",
...     covariates=["age"],
...     method="conditional",
... )
>>> result.summary()

breakdown_frontier ¶

breakdown_frontier(estimate: float, se: float, assumption: str = 'parallel_trends', max_violation: float = 0.1, n_grid: int = 100, alpha: float = 0.05) -> BoundsResult

Masten-Poirier (2021) breakdown frontier for qualitative conclusions.

For a given point estimate and standard error, computes how much an identifying assumption can be violated before a qualitative conclusion (e.g., "positive treatment effect") breaks down.

Parameters:

Name	Type	Description	Default
`estimate`	`float`	Point estimate of the treatment effect.	required
`se`	`float`	Standard error of the estimate.	required
`assumption`	`str`	Label for the identifying assumption being relaxed. Currently supports a generic linear violation model applicable to `'parallel_trends'`, `'exclusion_restriction'`, or `'selection_on_observables'`.	`'parallel_trends'`
`max_violation`	`float`	Maximum magnitude of the assumption violation to explore.	`0.1`
`n_grid`	`int`	Grid resolution for the frontier.	`100`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`BoundsResult`	lower/upper give the identified set at the maximum violation. `model_info` contains `'breakdown_point'` (the violation magnitude at which the conclusion reverses) and `'frontier_grid'` / `'frontier_bounds'` for plotting.

Examples:

>>> import statspai as sp
>>> result = sp.breakdown_frontier(
...     estimate=0.05, se=0.02,
...     assumption="parallel_trends",
...     max_violation=0.1,
... )
>>> result.summary()
>>> result.plot()

Name	Description
`- Lee Bounds : Bounds on ATE under sample selection`
`- Manski Bounds : Worst-case bounds with minimal assumptions`
`- Horowitz-Manski Bounds : Tighter bounds conditioning on covariates`
`- IV Bounds : Bounds under imperfect instruments`
`- Oster Delta : Coefficient stability / identified set`
`- Selection Bounds : Lee bounds with covariates`
`- Breakdown Frontier : Assumption robustness frontier`

statspai.bounds¶

bounds ¶

BoundsResult dataclass ¶

width property ¶

midpoint property ¶

includes_zero ¶

summary ¶

plot ¶

MLBoundsResult dataclass ¶

center_shift ¶

lee_bounds ¶

manski_bounds ¶

horowitz_manski ¶

iv_bounds ¶

oster_delta ¶

selection_bounds ¶

breakdown_frontier ¶

`statspai.bounds`¶

BoundsResult `dataclass` ¶

width `property` ¶

midpoint `property` ¶

MLBoundsResult `dataclass` ¶