Skip to content

statspai.ope

ope

Off-Policy Evaluation (sp.ope): estimate the value of a target policy from data collected under a different behaviour policy. Covers contextual bandits and off-policy reinforcement learning evaluation.

Implemented: DM, IPS, SNIPS, DR, Switch-DR, sharp OPE under unobserved confounding (Hess, Frauen, Melnychuk & Feuerriegel 2025, arXiv:2502.13022), causal-policy forest (Kato 2025, arXiv:2512.22846).

OPEResult dataclass

Canonical Off-Policy Evaluation result.

All sp.ope.* and sp.direct_method/ips/snips/doubly_robust estimators return this class (the policy_learning OPEResult subclass is a thin alias that adds an estimator attribute for back-compat). isinstance(res, sp.OPEResult) therefore holds for results from either entry point.

estimator property

estimator: str

Backwards-compatible alias for :attr:method.

n_obs property

n_obs: int

Convenience accessor: returns diagnostics['n'] or diagnostics['n_obs'] if present.

direct_method

direct_method(reward_model, X: ndarray, pi_e: ndarray) -> OPEResult

Plug-in estimator using a fitted reward model Q(x, a).

Parameters:

Name Type Description Default
reward_model callable

reward_model(X, a) → vector of length n predicting E[R | X, a].

required
X (n, d) array
required
pi_e (n, K) array

Evaluation policy probabilities over K actions at each X_i.

required

ips

ips(actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, clip: float | None = 1000.0) -> OPEResult

Inverse Propensity Scoring (aka IPS / Horvitz-Thompson).

snips

snips(actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, clip: float | None = 1000.0) -> OPEResult

Self-Normalized IPS -- reduces variance at small bias cost.

doubly_robust

doubly_robust(X: ndarray, actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, reward_model, clip: float | None = 1000.0) -> OPEResult

Doubly Robust estimator (Dudik, Langford, Li 2011).

switch_dr

switch_dr(X: ndarray, actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, reward_model, tau: float = 10.0) -> OPEResult

Switch-DR (Wang, Agarwal, Dudík 2017): fall back to the DM whenever the importance ratio exceeds tau.

evaluate

evaluate(method: str, *, X: ndarray | None = None, actions: ndarray | None = None, rewards: ndarray | None = None, pi_b: ndarray | None = None, pi_e: ndarray | None = None, reward_model=None, **kw) -> OPEResult

Dispatch-by-name for OPE methods.

method : {"DM", "IPS", "SNIPS", "DR", "Switch-DR"}