statspai.ope¶
ope ¶
Off-Policy Evaluation (sp.ope): estimate the value of a target
policy from data collected under a different behaviour policy. Covers
contextual bandits and off-policy reinforcement learning evaluation.
Implemented: DM, IPS, SNIPS, DR, Switch-DR, sharp OPE under unobserved confounding (Hess, Frauen, Melnychuk & Feuerriegel 2025, arXiv:2502.13022), causal-policy forest (Kato 2025, arXiv:2512.22846).
OPEResult
dataclass
¶
Canonical Off-Policy Evaluation result.
All sp.ope.* and sp.direct_method/ips/snips/doubly_robust estimators
return this class (the policy_learning OPEResult subclass is a thin alias
that adds an estimator attribute for back-compat). isinstance(res,
sp.OPEResult) therefore holds for results from either entry point.
direct_method ¶
direct_method(reward_model, X: ndarray, pi_e: ndarray) -> OPEResult
Plug-in estimator using a fitted reward model Q(x, a).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reward_model
|
callable
|
|
required |
X
|
(n, d) array
|
|
required |
pi_e
|
(n, K) array
|
Evaluation policy probabilities over K actions at each X_i. |
required |
ips ¶
ips(actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, clip: float | None = 1000.0) -> OPEResult
Inverse Propensity Scoring (aka IPS / Horvitz-Thompson).
snips ¶
snips(actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, clip: float | None = 1000.0) -> OPEResult
Self-Normalized IPS -- reduces variance at small bias cost.
doubly_robust ¶
doubly_robust(X: ndarray, actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, reward_model, clip: float | None = 1000.0) -> OPEResult
Doubly Robust estimator (Dudik, Langford, Li 2011).
switch_dr ¶
switch_dr(X: ndarray, actions: ndarray, rewards: ndarray, pi_b: ndarray, pi_e: ndarray, reward_model, tau: float = 10.0) -> OPEResult
Switch-DR (Wang, Agarwal, Dudík 2017): fall back to the DM
whenever the importance ratio exceeds tau.
evaluate ¶
evaluate(method: str, *, X: ndarray | None = None, actions: ndarray | None = None, rewards: ndarray | None = None, pi_b: ndarray | None = None, pi_e: ndarray | None = None, reward_model=None, **kw) -> OPEResult
Dispatch-by-name for OPE methods.
method : {"DM", "IPS", "SNIPS", "DR", "Switch-DR"}