`statspai.ope`¶

ope ¶

Off-Policy Evaluation (sp.ope): estimate the value of a target policy from data collected under a different behaviour policy. Covers contextual bandits and off-policy reinforcement learning evaluation.

Implemented: DM, IPS, SNIPS, DR, Switch-DR, sharp OPE under unobserved confounding (Hess, Frauen, Melnychuk & Feuerriegel 2025, arXiv:2502.13022), causal-policy forest (Kato 2025, arXiv:2512.22846).

OPEResult `dataclass` ¶

Bases: ResultProtocolMixin

Canonical Off-Policy Evaluation result.

All sp.ope.* and sp.direct_method/ips/snips/doubly_robust estimators return this class (the policy_learning OPEResult subclass is a thin alias that adds an estimator attribute for back-compat). isinstance(res, sp.OPEResult) therefore holds for results from either entry point.

Examples:

>>> import numpy as np
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n, K = 500, 3
>>> pi_b = np.full((n, K), 1.0 / K)          # uniform behaviour policy
>>> pi_e = np.tile([0.6, 0.3, 0.1], (n, 1))  # evaluation policy to score
>>> actions = np.array([rng.choice(K, p=pi_b[i]) for i in range(n)])
>>> rewards = (actions == 0).astype(float) + rng.normal(0, 0.1, n)
>>> res = sp.ope.ips(actions, rewards, pi_b, pi_e)
>>> type(res).__name__
'OPEResult'
>>> res.method
'IPS'
>>> bool(isinstance(res, sp.OPEResult))
True

estimator `property` ¶

estimator: str

Backwards-compatible alias for :attr:method.

n_obs `property` ¶

n_obs: int

Return diagnostics['n'] or diagnostics['n_obs'] if present.

direct_method ¶

direct_method(reward_model: RewardModel, X: ndarray, pi_e: ndarray) -> OPEResult

Plug-in estimator using a fitted reward model Q(x, a).

Parameters:

Name	Type	Description	Default
`reward_model`	`callable`	`reward_model(X, a)` → vector of length n predicting E[R \| X, a].	required
`X`	`(n, d) array`		required
`pi_e`	`(n, K) array`	Evaluation policy probabilities over K actions at each X_i.	required

ips ¶

ips(actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, clip: float | None = 1000.0) -> OPEResult

Inverse Propensity Scoring (aka IPS / Horvitz-Thompson).

snips ¶

snips(actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, clip: float | None = 1000.0) -> OPEResult

Self-Normalized IPS -- reduces variance at small bias cost.

doubly_robust ¶

doubly_robust(X: ndarray, actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, reward_model: RewardModel, clip: float | None = 1000.0) -> OPEResult

Doubly Robust estimator (Dudik, Langford, Li 2011).

switch_dr ¶

switch_dr(X: ndarray, actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, reward_model: RewardModel, tau: float = 10.0) -> OPEResult

Switch-DR (Wang, Agarwal, Dudík 2017): fall back to the DM whenever the importance ratio exceeds tau.

evaluate ¶

evaluate(method: str, *, X: ndarray | None = None, actions: ndarray | None = None, rewards: ndarray | None = None, pi_b: ndarray | None = None, pi_e: ndarray | None = None, reward_model: RewardModel | None = None, **kw: Any) -> OPEResult

Dispatch-by-name for OPE methods.

method : {"DM", "IPS", "SNIPS", "DR", "Switch-DR"}

statspai.ope¶