`statspai.causal_llm`¶

causal_llm ¶

LLM × Causal Inference (StatsPAI v0.10).

Three integration points where large language models help causal analysis without replacing the formal estimator:

:func:llm_dag_propose — propose candidate DAGs from variable names + domain description (Kiciman-Sharma 2025, arXiv 2402.11068).
:func:llm_unobserved_confounders — generate plausible unobserved confounder candidates for E-value sensitivity analysis (arXiv 2603.14273).
:func:llm_sensitivity_priors — propose Cornfield-style sensitivity parameter priors based on the substantive context.

All three are offline by default — they ship with deterministic heuristic backends so they work without an API key. If a real LLM client (OpenAI / Anthropic / local) is available via the optional [llm] extra, set the client keyword argument.

The deterministic backends are designed to be transparent: they return reproducible candidates derived from variable-name pattern matching and domain heuristics, not silent fabrications.

LLMDAGProposal `dataclass` ¶

Result of an LLM (or heuristic) DAG proposal.

Returned by :func:llm_dag_propose. Carries the proposed directed edges, a per-variable role classification, one rationale sentence per edge, and the backend that produced them. Use :meth:to_dag_string to feed the proposal straight into sp.dag(...).

Examples:

>>> import statspai as sp
>>> prop = sp.llm_dag_propose(
...     variables=["treatment", "wage", "age"],
...     domain="labor economics",
... )
>>> isinstance(prop, sp.LLMDAGProposal)
True
>>> prop.backend
'heuristic'
>>> prop.to_dag_string()
'treatment -> wage; age -> treatment; age -> wage'

to_dag_string ¶

to_dag_string() -> str

Format as 'A -> B; C -> D' for sp.dag(...).

UnobservedConfounderProposal `dataclass` ¶

List of plausible unobserved confounders + suggested E-values.

Produced by :func:llm_unobserved_confounders. Carries the candidate confounder names (.candidates), the matching E-value thresholds needed to nullify the observed effect (.suggested_evalue_thresholds), the .domain, the .backend used, and a formatted .summary().

Examples:

>>> import statspai as sp
>>> prop = sp.llm_unobserved_confounders(
...     treatment="statin use",
...     outcome="cardiovascular mortality",
...     domain="health",
...     point_estimate_rr=1.5,
... )
>>> type(prop).__name__
'UnobservedConfounderProposal'
>>> prop.backend
'heuristic'
>>> bool(len(prop.candidates) > 0)
True
>>> bool(isinstance(prop.summary(), str))
True

SensitivityPriorProposal `dataclass` ¶

Suggested sensitivity parameter priors for sensemakr-style analysis.

Returned by :func:llm_sensitivity_priors. Bundles the proposed rho_max / r2 Cinelli-Hazlett sensitivity bounds with the domain, rationale, and which backend produced them.

Examples:

>>> import statspai as sp
>>> res = sp.llm_sensitivity_priors(treatment="schooling",
...                                 outcome="earnings",
...                                 domain="labor")
>>> isinstance(res, sp.SensitivityPriorProposal)
True
>>> isinstance(res.summary(), str)
True

CausalMASResult `dataclass` ¶

Bases: ResultProtocolMixin

Structured output of :func:causal_mas.

Attributes:

Name	Type	Description
`edges`	`list of (parent, child)`	Consensus edge list surviving the debate at `final_threshold`.
`confidence`	`dict {(p, c): float}`	Fraction of agents / rounds that endorsed the edge (in `[0, 1]`).
`roles`	`dict {var: role}`	Proposer-assigned roles (treatment / outcome / confounder / instrument / unknown).
`transcript`	`list of dict`	Round-by-round debate log. Each entry has keys `{round, agent, action, payload}` so reviewers can audit how the consensus formed.
`rounds`	`int`
`backend`	`str`	`'heuristic'` or the LLM client's repr.
`final_threshold`	`float`	Confidence cutoff that produced `edges`.

LLMClient ¶

Minimal interface expected by :func:causal_mas and friends.

Subclasses / adapters must implement chat(role, prompt) — a single-turn completion call that returns the model's plain-text response. Everything else (streaming, tools, JSON mode) is deliberately out of scope because :func:causal_mas only needs a bag of edge proposals or critiques.

LLMConstrainedDAGResult `dataclass` ¶

Bases: ResultProtocolMixin

Output of :func:llm_dag_constrained.

Attributes:

Name	Type	Description
`final_edges`	`list of (str, str)`	Directed edges in the final CPDAG.
`edge_confidence`	`DataFrame`	One row per candidate edge with columns `edge` (tuple), `llm_score` (float in [0,1] or NaN), `ci_pvalue` (float or NaN), `retained` (bool), `source` (one of `'required'` / `'forbidden'` / `'ci-test'`).
`iteration_log`	`list of dict`	Per-iteration structured trace of which edges were proposed, validated, demoted.
`skeleton`	`DataFrame`	Final undirected adjacency matrix (variables x variables).
`cpdag`	`DataFrame`	Final CPDAG adjacency matrix.
`variables`	`list of str`
`n_obs`	`int`
`alpha`	`float`
`converged`	`bool`	`True` if the loop stopped early because no edges were demoted in the most recent iteration.
`provenance`	`dict`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> x = rng.normal(size=n)
>>> z = 0.8 * x + rng.normal(size=n)
>>> y = 1.2 * x + 0.5 * z + rng.normal(size=n)
>>> data = pd.DataFrame({"X": x, "Z": z, "Y": y})
>>> def oracle(vars_, desc):
...     return [("X", "Y", 0.95), ("X", "Z", 0.9)]
>>> res = sp.llm_dag_constrained(
...     data, variables=["X", "Z", "Y"], oracle=oracle, max_iter=2)
>>> isinstance(res, sp.LLMConstrainedDAGResult)
True
>>> bool(len(res.final_edges) >= 1)
True

to_dag ¶

to_dag() -> Any

Convert the final CPDAG into a :class:statspai.dag.DAG.

DAGValidationResult `dataclass` ¶

Bases: ResultProtocolMixin

Output of :func:llm_dag_validate.

Attributes:

Name	Type	Description
`edge_evidence`	`DataFrame`	One row per declared edge with columns `edge` (tuple), `declared` (bool), `ci_pvalue` (float or NaN) and `supported` (bool -- `True` when the partial-correlation CI test rejects conditional independence at `alpha`, i.e. the data backs keeping the edge).
`n_supported`	`int`	Number of edges the data supports.
`n_unsupported`	`int`	Number of edges the data does not support.
`alpha`	`float`	CI-test significance level used.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> x = rng.normal(size=n)
>>> z = 0.8 * x + rng.normal(size=n)
>>> y = 1.2 * x + 0.5 * z + rng.normal(size=n)
>>> data = pd.DataFrame({"X": x, "Z": z, "Y": y})
>>> g = sp.dag("X -> Z; X -> Y; Z -> Y")
>>> res = sp.llm_dag_validate(g, data)
>>> isinstance(res, sp.DAGValidationResult)
True
>>> res.n_supported
3

LLMConfigurationError ¶

Bases: RuntimeError

Raised when no LLM provider can be resolved.

Message points the user at concrete remediation steps rather than the generic "no key found" — agents and CLI users alike should know exactly what to type next.

llm_dag_propose ¶

llm_dag_propose(variables: List[str], domain: str = '', client: Optional[Any] = None, seed: int = 0) -> LLMDAGProposal

Propose a candidate DAG from variable names + domain description.

Parameters:

Name	Type	Description	Default
`variables`	`list of str`	Names of variables in the dataset.	required
`domain`	`str`	Free-text domain description (e.g. "labor economics, education and earnings"). Helps the LLM but ignored by the heuristic backend.	`''`
`client`	`object`	An LLM client implementing `.complete(prompt: str) -> str`. If `None`, use the deterministic heuristic backend.	`None`
`seed`	`int`		`0`

Returns:

Type	Description
`LLMDAGProposal`

Examples:

>>> import statspai as sp
>>> prop = sp.llm_dag_propose(
...     variables=["education", "treatment", "wage", "age"],
...     domain="labor economics: schooling and earnings",
... )
>>> prop.backend
'heuristic'
>>> prop.roles["treatment"]
'treatment'
>>> prop.roles["wage"]
'outcome'
>>> ("treatment", "wage") in prop.edges
True
>>> prop.to_dag_string()
'treatment -> wage; education -> treatment; education -> wage; age -> treatment; age -> wage'

llm_unobserved_confounders ¶

llm_unobserved_confounders(treatment: str, outcome: str, domain: str = 'health', client: Optional[Any] = None, point_estimate_rr: float = 1.5) -> UnobservedConfounderProposal

Enumerate plausible unobserved confounders for a study.

Parameters:

Name	Type	Description	Default
`treatment`	`str`	Free-text descriptions (used by LLM, ignored by heuristic).	required
`outcome`	`str`	Free-text descriptions (used by LLM, ignored by heuristic).	required
`domain`	`('health', 'education', 'labor', 'policy', 'marketing')`		`'health'`
`client`	`object`	LLM client with `.complete(prompt: str) -> str`.	`None`
`point_estimate_rr`	`float`	Observed risk ratio; suggested E-values are scaled relative to this so the user can read "to nullify a RR of X you'd need an unobserved RR of Y".	`1.5`

Returns:

Type	Description
`UnobservedConfounderProposal`

Examples:

>>> import statspai as sp
>>> prop = sp.llm_unobserved_confounders(
...     treatment="statin use",
...     outcome="cardiovascular mortality",
...     domain="health",
...     point_estimate_rr=1.5,
... )
>>> prop.backend
'heuristic'
>>> prop.domain
'health'
>>> bool(len(prop.candidates) == len(prop.suggested_evalue_thresholds))
True

llm_sensitivity_priors ¶

llm_sensitivity_priors(treatment: str, outcome: str, domain: str = 'health', client: Optional[Any] = None) -> SensitivityPriorProposal

Propose sensitivity-analysis priors for the substantive setting.

Parameters:

Name	Type	Description	Default
`treatment`	`str`		required
`outcome`	`str`		required
`domain`	`('health', 'education', 'labor', 'policy', 'marketing')`		`'health'`
`client`	`object`	LLM client with `.complete(prompt: str) -> str`.	`None`

Returns:

Type	Description
`SensitivityPriorProposal`

Examples:

>>> import statspai as sp
>>> res = sp.llm_sensitivity_priors(treatment="smoking",
...                                 outcome="weight_change",
...                                 domain="health")
>>> type(res).__name__
'SensitivityPriorProposal'
>>> res.backend
'heuristic'
>>> (res.rho_max, res.r2)
(0.3, 0.04)

References

cinelli2020making

openai_client ¶

openai_client(model: str = 'gpt-4o-mini', *, api_key: Optional[str] = None, base_url: Optional[str] = None, organization: Optional[str] = None, temperature: float = 0.0, max_tokens: int = 1024, max_retries: int = 3, system_prompt: Optional[str] = None) -> LLMClient

Construct an OpenAI-compatible :class:LLMClient.

Requires the optional openai>=1.0 extra. Accepts any base_url override so you can point this at an OpenAI-compatible endpoint (Azure OpenAI, vLLM, Ollama's OpenAI-compat mode, ...).

Examples:

>>> import statspai as sp
>>> client = sp.causal_llm.openai_client(
...     model="gpt-4o-mini", api_key="sk-...",
... )
>>> res = sp.causal_llm.causal_mas(
...     variables=["age","treatment","outcome"], client=client,
... )

anthropic_client ¶

anthropic_client(model: str = 'claude-opus-4-7', *, api_key: Optional[str] = None, base_url: Optional[str] = None, temperature: float = 0.0, max_tokens: int = 1024, max_retries: int = 3, thinking_budget: int = 0, system_prompt: Optional[str] = None) -> LLMClient

Construct an Anthropic-compatible :class:LLMClient.

Requires the optional anthropic>=0.30 extra. Defaults to Claude Opus 4.7 (the latest generally-available model as of StatsPAI v1.4).

Parameters:

Name	Type	Description	Default
`thinking_budget`	`int`	Enable Claude extended thinking with this many reasoning tokens. Values `>= 1024` activate the feature; `0` disables it (the legacy behaviour). The budget is consumed from `max_tokens`, so set `max_tokens` comfortably above `thinking_budget + expected_answer_tokens`. The reasoning trace is captured on `client.history[-1]['thinking']` for auditability but is not returned as part of the chat response.	`0`

Examples:

>>> import statspai as sp
>>> client = sp.causal_llm.anthropic_client(
...     model="claude-opus-4-7",
...     thinking_budget=4096,
...     max_tokens=8192,
... )

echo_client ¶

echo_client(response_fn: Callable[[str, str], str]) -> LLMClient

Deterministic scripted-response client for testing.

This is a network-free test double: response_fn maps (role, prompt) to a canned string, so multi-agent causal discovery runs deterministically without any LLM SDK or API key.

Examples:

>>> import statspai as sp
>>> def scripted(role, prompt):
...     if role == 'proposer':
...         return 'age -> treatment\ntreatment -> outcome'
...     return ''
>>> client = sp.causal_llm.echo_client(scripted)
>>> type(client).__name__
'_EchoClient'
>>> res = sp.causal_llm.causal_mas(
...     variables=['age', 'treatment', 'outcome'], client=client,
... )
>>> ('treatment', 'outcome') in res.edges
True

llm_dag_constrained ¶

llm_dag_constrained(data: DataFrame, variables: Optional[Sequence[str]] = None, descriptions: Optional[Dict[str, str]] = None, *, oracle: Optional[Callable[[Sequence[str], Dict[str, str]], Any]] = None, alpha: float = 0.05, ci_test: str = 'fisherz', max_iter: int = 3, high_conf_threshold: float = 0.7, low_conf_threshold: float = 0.3, forbid_low_conf: bool = False, verbose: bool = False) -> LLMConstrainedDAGResult

Closed-loop LLM-assisted causal discovery.

Iterate propose → constrain → CI-validate → demote until the proposed required-edge set stops shrinking or max_iter is hit.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Observational data.	required
`variables`	`sequence of str`	Subset of columns to include in the discovery. Defaults to all numeric columns of `data`.	`None`
`descriptions`	`dict`	Variable name -> human-readable description (passed to the oracle).	`None`
`oracle`	`callable`	Function `f(variables, descriptions) -> list[(from, to[, confidence])]`. When omitted, the loop falls back to plain PC discovery (data-only) and returns a single-iteration result with no LLM scores.	`None`
`alpha`	`float`	CI-test significance level for both PC and the validation pass.	`0.05`
`ci_test`	`'fisherz'`	Conditional independence test.	`'fisherz'`
`max_iter`	`int`	Upper bound on the number of propose-validate cycles.	`3`
`high_conf_threshold`	`float`	Minimum LLM confidence to inject the edge as a required background-knowledge constraint into PC.	`0.7`
`low_conf_threshold`	`float`	Maximum LLM confidence below which the edge is treated as a forbidden candidate (only when `forbid_low_conf=True`).	`0.3`
`forbid_low_conf`	`bool`	When True, low-confidence edges are forbidden in the PC skeleton instead of being passed through as plain candidates. Off by default — most LLMs return only positive edges and we don't want to over-prune.	`False`
`verbose`	`bool`	Print per-iteration progress.	`False`

Returns:

Type	Description
`LLMConstrainedDAGResult`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> x = rng.normal(size=n)
>>> z = 0.8 * x + rng.normal(size=n)
>>> y = 1.2 * x + 0.5 * z + rng.normal(size=n)
>>> data = pd.DataFrame({"X": x, "Z": z, "Y": y})
>>> def echo_oracle(vars_, desc):
...     return [("X", "Y", 0.95), ("X", "Z", 0.9)]
>>> r = sp.llm_dag_constrained(
...     data, variables=["X", "Z", "Y"], oracle=echo_oracle, max_iter=2)
>>> bool(len(r.final_edges) >= 1)
True

llm_dag_validate ¶

llm_dag_validate(dag: Any, data: DataFrame, *, alpha: float = 0.05, ci_test: str = 'fisherz') -> DAGValidationResult

Per-edge CI-test validation of a declared DAG.

For every directed edge a -> b in dag, run a partial- correlation CI test of a ⟂ b | parents(b) \ {a}. Edges with p-value <= alpha are supported (the data did not provide evidence to remove them); edges with p-value > alpha are unsupported (the data is consistent with the conditional independence implied by removing the edge).

Parameters:

Name	Type	Description	Default
`dag`	statspai.dag.DAG or object exposing an ``edges`` attribute	Declared causal graph. Latent `_L_*` nodes are ignored.	required
`data`	`DataFrame`		required
`alpha`	`float`		`0.05`
`ci_test`	`'fisherz'`		`'fisherz'`

Returns:

Type	Description
`DAGValidationResult`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> x = rng.normal(size=n)
>>> z = 0.8 * x + rng.normal(size=n)
>>> y = 1.2 * x + 0.5 * z + rng.normal(size=n)
>>> data = pd.DataFrame({"X": x, "Z": z, "Y": y})
>>> g = sp.dag("X -> Z; X -> Y; Z -> Y")
>>> res = sp.llm_dag_validate(g, data, alpha=0.05)
>>> res.n_supported, res.n_unsupported
(3, 0)

get_llm_client ¶

get_llm_client(*, client: Optional['LLMClient'] = None, provider: Optional[str] = None, model: Optional[str] = None, api_key: Optional[str] = None, allow_interactive: bool = True, config_path: Optional[Path] = None, **kwargs: Any) -> 'LLMClient'

Resolve an :class:LLMClient via layered fallback.

See module docstring for the full resolution order. Most users don't call this directly — it's plumbed into sp.paper(..., llm='auto') and the LLM-DAG closed-loop entry points so the typical workflow is "set env var, forget".

Parameters:

Name	Type	Description	Default
`client`	`LLMClient`	Already-built client. Returned as-is. (Layer 1.)	`None`
`provider`	`('anthropic', 'openai')`	Force a specific provider. (Layer 2.)	`'anthropic'`
`model`	`str`	Force a specific model. Defaults to `DEFAULT_MODELS[provider]`.	`None`
`api_key`	`str`	Pass-through to the constructed client. When omitted, the provider SDK reads from its standard env var (`ANTHROPIC_API_KEY` / `OPENAI_API_KEY`).	`None`
`allow_interactive`	`bool`	Whether to fall back to a stdin prompt when prior layers fail. Set to `False` in agent / Jupyter contexts where `input()` would hang the kernel.	`True`
`config_path`	`Path`	Override the config file location (mainly for testing).	`None`
`**kwargs`	`Any`	Forwarded to the provider's client constructor (e.g. `temperature`, `max_tokens`, `thinking_budget` for Anthropic, `base_url` for OpenAI).	`{}`

Raises:

Type	Description
`LLMConfigurationError`	When no path resolves and either `allow_interactive=False` or stdin is not a TTY (so interactive can't run).

list_available_providers ¶

list_available_providers() -> Dict[str, Dict[str, Any]]

Inspect the environment and return what's currently usable.

Returns:

Type	Description
`dict`	`{provider_name: {"available": bool, "default_model": str, "env_var": str}}` for each known provider. Useful both for the interactive prompt and for tools that want to surface available LLMs to the user.

configure_llm ¶

configure_llm(*, provider: Optional[str] = None, model: Optional[str] = None, config_path: Optional[Path] = None) -> Path

Persist a provider+model preference to ~/.config/statspai/llm.toml.

Use this for a "set once, forget" workflow when working on a machine with both ANTHROPIC_API_KEY and OPENAI_API_KEY set — without it, the resolver tie-breaks to Anthropic.

Examples:

>>> import statspai as sp
>>> sp.causal_llm.configure_llm(
...     provider="openai", model="gpt-4o",
... )

llm_config_path ¶

llm_config_path() -> Path

Return the platform-appropriate config file path.

Honours XDG_CONFIG_HOME on Linux/macOS; falls back to ~/.config if unset. Windows uses %APPDATA%.

load_llm_config ¶

load_llm_config(path: Optional[Path] = None) -> Dict[str, Any]

Read the TOML preferences file, or return an empty dict.

Never raises on missing / malformed file — returns {} so callers can fall through cleanly to the next resolution layer.

statspai.causal_llm¶

causal_llm ¶

LLMDAGProposal dataclass ¶

to_dag_string ¶

UnobservedConfounderProposal dataclass ¶

SensitivityPriorProposal dataclass ¶

CausalMASResult dataclass ¶

LLMClient ¶

LLMConstrainedDAGResult dataclass ¶

to_dag ¶

DAGValidationResult dataclass ¶

LLMConfigurationError ¶

llm_dag_propose ¶

llm_unobserved_confounders ¶

llm_sensitivity_priors ¶

openai_client ¶

anthropic_client ¶

echo_client ¶

llm_dag_constrained ¶

llm_dag_validate ¶

get_llm_client ¶

list_available_providers ¶

configure_llm ¶

llm_config_path ¶

load_llm_config ¶

`statspai.causal_llm`¶

LLMDAGProposal `dataclass` ¶

UnobservedConfounderProposal `dataclass` ¶

SensitivityPriorProposal `dataclass` ¶

CausalMASResult `dataclass` ¶

LLMConstrainedDAGResult `dataclass` ¶

DAGValidationResult `dataclass` ¶