`statspai.policy_learning`¶

policy_learning ¶

Policy Learning: Optimal treatment assignment from heterogeneous effects.

Learns an interpretable treatment assignment policy that maximises the expected welfare (value) of the population. Given estimated CATE, finds the optimal tree-based policy: "who should be treated?"

Components

PolicyTree : Optimal depth-limited decision tree for treatment assignment (Athey & Wager 2021).
policy_value : Evaluate the expected value of a treatment policy using doubly robust scores.

References

Athey, S. & Wager, S. (2021). Policy Learning with Observational Data. Econometrica, 89(1), 133-161. [@athey2021matrix]

Zhou, Z., Athey, S., & Wager, S. (2023). Offline Multi-Action Policy Learning: Generalization and Optimization. Operations Research, 71(1), 148-183. [@zhou2023offline]

PolicyTree ¶

Optimal depth-limited policy tree.

Parameters:

Name	Type	Default
`data`	`DataFrame`	required
`y`	`str`	required
`treat`	`str`	required
`covariates`	`list of str`	required
`policy_covariates`	`list of str`	`None`
`max_depth`	`int`	`2`
`min_leaf_size`	`int`	`25`
`n_folds`	`int`	`5`
`alpha`	`float`	`0.05`
`random_state`	`int`	`42`

Examples:

Fit the estimator directly, then route fresh covariates through the learned rule with :meth:predict:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> x1 = rng.normal(size=n)
>>> x2 = rng.normal(size=n)
>>> treat = rng.integers(0, 2, n)
>>> y = 1.0 + 2.0 * (x1 > 0) * treat + 0.5 * x2 + rng.normal(0, 1, n)
>>> df = pd.DataFrame({"y": y, "treat": treat, "x1": x1, "x2": x2})
>>> tree = sp.PolicyTree(data=df, y="y", treat="treat",
...                      covariates=["x1", "x2"], max_depth=2,
...                      min_leaf_size=30, n_folds=3, random_state=0)
>>> res = tree.fit()
>>> bool(res["value_gain"] >= 0)
True
>>> rec = tree.predict(np.array([[1.5, 0.0], [-1.5, 0.0]]))
>>> int(rec.shape[0])
2
>>> bool(set(int(v) for v in rec) <= {0, 1})
True

fit ¶

fit() -> Dict[str, Any]

Learn the optimal policy tree.

predict ¶

predict(X_new: ndarray) -> ndarray

Predict treatment assignment for new data.

Parameters:

Name	Type	Description	Default
`X_new`	`ndarray(n, p)`	Policy covariates for new observations.	required

Returns:

Type	Description
`ndarray(n)`	Binary treatment recommendations (0 or 1).

PolicyTreeResult ¶

Bases: dict, ResultProtocolMixin

Result of :func:policy_tree.

Inherits from :class:dict so the legacy result['policy'] API keeps working (and isinstance(result, dict) is still True), while also exposing rich attribute access plus methods:

:attr:value_policy_se — influence-function SE of the policy value, computed from the AIPW scores :math:\Gamma_i and the binary policy :math:\hat\pi(X_i). Under the standard cross-fit / overlap conditions this is asymptotically valid.
:meth:summary / :meth:plot_tree / :meth:to_latex / :meth:cite that match the Stata / R reporting idioms.
:meth:to_excel for publication exports.

The tree attribute holds the fitted :class:PolicyTree instance so :meth:PolicyTree.predict is reachable downstream.

Examples:

Produced by :func:policy_tree; the legacy dict API still works alongside attribute access and the rich reporting methods:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> x1 = rng.normal(size=n)
>>> x2 = rng.normal(size=n)
>>> treat = rng.integers(0, 2, n)
>>> y = 1.0 + 2.0 * (x1 > 0) * treat + 0.5 * x2 + rng.normal(0, 1, n)
>>> df = pd.DataFrame({"y": y, "treat": treat, "x1": x1, "x2": x2})
>>> res = sp.policy_tree(df, y="y", treat="treat",
...                      covariates=["x1", "x2"], max_depth=2,
...                      min_leaf_size=30, n_folds=3, random_state=0)
>>> isinstance(res, sp.PolicyTreeResult)
True
>>> isinstance(res, dict)               # legacy result['policy'] still works
True
>>> res["n_obs"]
300
>>> bool(0.0 <= res.fraction_treated <= 1.0)
True
>>> "begin{table}" in res.to_latex()        # LaTeX table output
True

plot_tree ¶

plot_tree(ax: Any = None, figsize: Tuple[float, float] = (8.0, 5.0), node_color: str = '#e8f0fe') -> Tuple[Any, Any]

Draw the policy tree as a labeled hierarchical diagram.

Each split node shows feature ≤ threshold; each leaf shows TREAT / DON'T TREAT plus the leaf value (mean AIPW score). Requires matplotlib.

to_latex ¶

to_latex(caption: Optional[str] = None, label: str = 'tab:policy_tree') -> str

Render a publication-style summary table (LaTeX).

to_excel ¶

to_excel(path: str, digits: int = 4) -> str

Write a single-sheet Excel summary.

policy_value ¶

policy_value(scores: ndarray, policy: ndarray) -> float

Evaluate the expected value of a treatment policy.

Parameters:

Name	Type	Description	Default
`scores`	`ndarray(n)`	Doubly robust scores (AIPW pseudo-outcomes for treatment). Positive scores indicate the individual benefits from treatment.	required
`policy`	`ndarray(n)`	Binary policy recommendations (0 or 1).	required

Returns:

Type	Description
`float`	Estimated expected value of the policy.

Examples:

>>> import numpy as np
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 400
>>> scores = rng.normal(0.3, 1.0, size=n)  # DR gains from treating

Treat-everyone vs an oracle policy (treat only positive-gain units):

>>> policy_all = np.ones(n, dtype=int)
>>> policy_oracle = (scores > 0).astype(int)
>>> round(float(sp.policy_value(scores, policy_all)), 2)
0.29
>>> round(float(sp.policy_value(scores, policy_oracle)), 2)
0.54

direct_method ¶

direct_method(X: ndarray, A: ndarray, R: ndarray, pi_target: Any, n_actions: Optional[int] = None, alpha: float = 0.05) -> OPEResult

Direct outcome regression (plug-in Q-model) OPE.

Examples:

>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 80
>>> X = rng.normal(size=(n, 3))
>>> A = rng.integers(0, 3, size=n)
>>> R = rng.normal(size=n)
>>> pi_target = rng.dirichlet(np.ones(3), size=n)
>>> res = sp.direct_method(X, A, R, pi_target)
>>> float(res.value)

ips ¶

ips(X: ndarray, A: ndarray, R: ndarray, pi_target: Any, pi_behavior: Optional[ndarray] = None, clip: float = 50.0, alpha: float = 0.05) -> OPEResult

Inverse propensity score OPE.

Notes

If pi_behavior is None and the internal behavior-policy logistic regression fails, propensities fall back to uniform 1/K; a ConvergenceWarning is emitted and diagnostics['propensity_fallback'] is set to True.

Examples:

>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 80
>>> X = rng.normal(size=(n, 3))
>>> A = rng.integers(0, 3, size=n)
>>> R = rng.normal(size=n)
>>> pi_target = rng.dirichlet(np.ones(3), size=n)
>>> res = sp.ips(X, A, R, pi_target)
>>> v = float(res.value)

snips ¶

snips(X: ndarray, A: ndarray, R: ndarray, pi_target: Any, pi_behavior: Optional[ndarray] = None, clip: float = 50.0, alpha: float = 0.05) -> OPEResult

Self-normalised IPS (bias-reduction for large IS weights).

Notes

If pi_behavior is None and the internal behavior-policy logistic regression fails, propensities fall back to uniform 1/K; a ConvergenceWarning is emitted and diagnostics['propensity_fallback'] is set to True.

Examples:

>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 80
>>> X = rng.normal(size=(n, 3))
>>> A = rng.integers(0, 3, size=n)
>>> R = rng.normal(size=n)
>>> pi_target = rng.dirichlet(np.ones(3), size=n)
>>> res = sp.snips(X, A, R, pi_target)
>>> v = float(res.value)

doubly_robust ¶

doubly_robust(X: ndarray, A: ndarray, R: ndarray, pi_target: Any, pi_behavior: Optional[ndarray] = None, n_actions: Optional[int] = None, clip: float = 50.0, alpha: float = 0.05) -> OPEResult

Doubly-robust OPE (Dudik et al. 2011).

Notes

If pi_behavior is None and the internal behavior-policy logistic regression fails, propensities fall back to uniform 1/K; a ConvergenceWarning is emitted and diagnostics['propensity_fallback'] is set to True.

Examples:

>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 80
>>> X = rng.normal(size=(n, 3))
>>> A = rng.integers(0, 3, size=n)
>>> R = rng.normal(size=n)
>>> pi_target = rng.dirichlet(np.ones(3), size=n)
>>> res = sp.doubly_robust(X, A, R, pi_target)
>>> v = float(res.value)

statspai.policy_learning¶

policy_learning ¶

PolicyTree ¶

fit ¶

predict ¶

PolicyTreeResult ¶

plot_tree ¶

to_latex ¶

to_excel ¶

policy_value ¶

direct_method ¶

ips ¶

snips ¶

doubly_robust ¶

`statspai.policy_learning`¶