Skip to content

statspai.proximal

proximal

Proximal Causal Inference (Tchetgen Tchetgen et al. 2020).

Identifies the ATE in the presence of an unmeasured confounder :math:U, using two proxies of :math:U:

  • :math:Z — "treatment-inducing confounding proxy" (independent of Y | D, U)
  • :math:W — "outcome-inducing confounding proxy" (independent of D | U)

plus measured covariates :math:X.

ProximalCausalInference

Class wrapper for :func:proximal.

NegativeControlResult dataclass

Unified result for negative-control procedures.

ProxyScoreResult dataclass

Per-candidate proxy score for PCI.

proximal

proximal(data: DataFrame, y: str, treat: str, proxy_z: List[str], proxy_w: List[str], covariates: Optional[List[str]] = None, bridge: str = 'linear', n_boot: int = 0, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult

Proximal causal inference via linear 2SLS on the outcome bridge.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
treat str

Treatment variable (binary or continuous).

required
proxy_z list of str

Treatment-inducing confounding proxy variable(s) (Z). These serve as instruments for the outcome proxy W.

required
proxy_w list of str

Outcome-inducing confounding proxy variable(s) (W). Endogenous regressors in the linear bridge.

required
covariates list of str

Measured baseline covariates X (exogenous controls).

None
bridge linear

Functional form of the outcome-confounding bridge. Only 'linear' is currently implemented — the linear 2SLS estimator described in Cui et al. (2024, §4).

Kernel-based bridges (Mastouri et al. 2021) and sieve/RKHS non-parametric bridges (Deaner 2018) are planned for a future release and will be accepted values of this argument. Passing any other string raises NotImplementedError — we prefer to fail loudly now over silently falling back to the linear bridge and mis-attributing results.

'linear'
n_boot int

If > 0, nonparametric bootstrap SE (rows, not cluster-robust). If 0, use closed-form 2SLS sandwich SE (homoskedastic).

0
alpha float
0.05
seed int
None

Returns:

Type Description
CausalResult

estimate is the coefficient on the treatment in the linear bridge — the proximal ATE under correct specification.

Examples:

>>> # ATE of smoking on lung cancer, with occupation (Z) and
>>> # secondhand-smoke exposure (W) as proxies for unmeasured
>>> # health behaviour/genetics.
>>> sp.proximal(df, y='lung_cancer', treat='smoker',
...             proxy_z=['occupation'], proxy_w=['shs_exposure'],
...             covariates=['age', 'sex'])

negative_control_outcome

negative_control_outcome(data: DataFrame, nco: str, treat: str, covariates: Optional[Sequence[str]] = None, alpha: float = 0.05) -> NegativeControlResult

Lipsitch-style NCO calibration.

Fit an OLS of the negative-control outcome nco on treat and optional covariates. A coefficient significantly different from zero signals residual confounding that the measured covariates failed to control for.

Parameters:

Name Type Description Default
data DataFrame
required
nco str

Negative-control outcome — a variable plausibly unaffected by the true treatment but sharing confounders with the real Y.

required
treat str

Treatment indicator or exposure variable.

required
covariates sequence of str

Measured confounders to condition on.

None
alpha float
0.05

Returns:

Type Description
NegativeControlResult

negative_control_exposure

negative_control_exposure(data: DataFrame, y: str, nce: str, covariates: Optional[Sequence[str]] = None, alpha: float = 0.05) -> NegativeControlResult

Regress outcome on a negative-control exposure.

A significant coefficient on nce — which by design is assumed to not causally affect y — indicates residual confounding along the exposure axis (selection, measurement error, etc.).

double_negative_control

double_negative_control(data: DataFrame, y: str, treat: str, nce: str, nco: str, covariates: Optional[Sequence[str]] = None, alpha: float = 0.05) -> NegativeControlResult

Double negative control estimator (Miao et al. 2018; Shi et al. 2020).

Under the linear / index model::

Y      = α0 + α_D D + α_U U + α_X X + ε_Y
NCO    = β0 + β_U U + β_X X + ε_W
E[U | NCE, X, D] linear in (NCE, X, D)

(plus standard independence/exclusion conditions), the ATE is point-identified by IV-regressing Y on (D, NCO, X) using (D, NCE, X) as instruments: NCE instruments for the proxy NCO, breaking the dependence on U. The coefficient on D is the de-biased ATE.

This is implemented as a just-identified 2SLS. The fitted ATE is asymptotically unbiased under the assumptions above and consistent with Shi et al. (2020, §3) closed-form.

proximal_regression

proximal_regression(data: DataFrame, y: str, treat: str, z_proxy: str, w_proxy: str, covariates: Optional[Sequence[str]] = None, alpha: float = 0.05, propensity_bounds: tuple = (0.02, 0.98)) -> ProximalRegResult

Doubly-robust regression-based PCI estimator for the ATE.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome column.

required
treat str

Binary treatment column.

required
z_proxy str

Treatment-inducing confounding proxy Z.

required
w_proxy str

Outcome-inducing confounding proxy W.

required
covariates sequence of str

Measured covariates X.

None
alpha float
0.05
propensity_bounds (float, float)
(0.02, 0.98)

Returns:

Type Description
ProximalRegResult

fortified_pci

fortified_pci(data: DataFrame, y: str, treat: str, proxy_z: List[str], proxy_w: List[str], covariates: Optional[List[str]] = None, alpha: float = 0.05, n_boot: int = 200, seed: int = 0) -> CausalResult

Fortified Proximal Causal Inference (doubly-robust PCI).

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
treat str
required
proxy_z list of str

Treatment-side proxies (instruments for W).

required
proxy_w list of str

Outcome-side proxies (endogenous bridge regressors).

required
covariates list of str
None
alpha float
0.05
n_boot int

Bootstrap reps for SE.

200
seed int
0

Returns:

Type Description
CausalResult

ATE estimate that is doubly robust to bridge / outcome misspecification.

References

Yu, Shi & Tchetgen Tchetgen (2025). Fortified Proximal Causal Inference with Many Invalid Proxies. arXiv 2506.13152. [@yu2025fortified]

bidirectional_pci

bidirectional_pci(data: DataFrame, y: str, treat: str, proxy_z: List[str], proxy_w: List[str], covariates: Optional[List[str]] = None, alpha: float = 0.05, n_boot: int = 200, seed: int = 0) -> CausalResult

Bidirectional PCI: simultaneous outcome + treatment bridge.

Parameters:

Name Type Description Default
data same as

:func:statspai.proximal.proximal.

required
y same as

:func:statspai.proximal.proximal.

required
treat same as

:func:statspai.proximal.proximal.

required
proxy_z same as

:func:statspai.proximal.proximal.

required
proxy_w same as

:func:statspai.proximal.proximal.

required
covariates same as

:func:statspai.proximal.proximal.

required
alpha float
0.05
n_boot int
200
seed int
0

Returns:

Type Description
CausalResult

ATE estimate from the bidirectional moment condition.

pci_mtp

pci_mtp(data: DataFrame, y: str, treat: str, proxy_z: List[str], proxy_w: List[str], delta: float, covariates: Optional[List[str]] = None, alpha: float = 0.05, n_boot: int = 200, seed: int = 0) -> CausalResult

PCI for Modified Treatment Policies (continuous-shift effect).

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome and continuous treatment.

required
treat str

Outcome and continuous treatment.

required
proxy_z list of str

Standard PCI proxies.

required
proxy_w list of str

Standard PCI proxies.

required
delta float

MTP shift; estimand is E[Y(D + δ)] - E[Y(D)].

required
covariates list of str
None
alpha float
0.05
n_boot int
200
seed int
0

Returns:

Type Description
CausalResult

select_pci_proxies

select_pci_proxies(data: DataFrame, y: str, treat: str, candidates: List[str], covariates: Optional[List[str]] = None, top_k: int = 2) -> ProxyScoreResult

Score and rank candidate proxies for PCI.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
treat str
required
candidates list of str

All variables that could plausibly serve as proxies.

required
covariates list of str
None
top_k int

Number of top candidates to recommend per side.

2

Returns:

Type Description
ProxyScoreResult