Skip to content

statspai.causal_llm

causal_llm

LLM × Causal Inference (StatsPAI v0.10).

Three integration points where large language models help causal analysis without replacing the formal estimator:

  • :func:llm_dag_propose — propose candidate DAGs from variable names + domain description (Kiciman-Sharma 2025, arXiv 2402.11068).
  • :func:llm_unobserved_confounders — generate plausible unobserved confounder candidates for E-value sensitivity analysis (arXiv 2603.14273).
  • :func:llm_sensitivity_priors — propose Cornfield-style sensitivity parameter priors based on the substantive context.

All three are offline by default — they ship with deterministic heuristic backends so they work without an API key. If a real LLM client (OpenAI / Anthropic / local) is available via the optional [llm] extra, set the client keyword argument.

The deterministic backends are designed to be transparent: they return reproducible candidates derived from variable-name pattern matching and domain heuristics, not silent fabrications.

LLMDAGProposal dataclass

Result of an LLM (or heuristic) DAG proposal.

to_dag_string

to_dag_string() -> str

Format as 'A -> B; C -> D' for sp.dag(...).

UnobservedConfounderProposal dataclass

List of plausible unobserved confounders + suggested E-values.

SensitivityPriorProposal dataclass

Suggested sensitivity parameter priors for sensemakr-style analysis.

CausalMASResult dataclass

Structured output of :func:causal_mas.

Attributes:

Name Type Description
edges list of (parent, child)

Consensus edge list surviving the debate at final_threshold.

confidence dict {(p, c): float}

Fraction of agents / rounds that endorsed the edge (in [0, 1]).

roles dict {var: role}

Proposer-assigned roles (treatment / outcome / confounder / instrument / unknown).

transcript list of dict

Round-by-round debate log. Each entry has keys {round, agent, action, payload} so reviewers can audit how the consensus formed.

rounds int
backend str

'heuristic' or the LLM client's repr.

final_threshold float

Confidence cutoff that produced edges.

LLMClient

Minimal interface expected by :func:causal_mas and friends.

Subclasses / adapters must implement chat(role, prompt) — a single-turn completion call that returns the model's plain-text response. Everything else (streaming, tools, JSON mode) is deliberately out of scope because :func:causal_mas only needs a bag of edge proposals or critiques.

LLMConstrainedDAGResult dataclass

Output of :func:llm_dag_constrained.

Attributes:

Name Type Description
final_edges list of (str, str)

Directed edges in the final CPDAG.

edge_confidence DataFrame

One row per candidate edge with columns edge (tuple), llm_score (float in [0,1] or NaN), ci_pvalue (float or NaN), retained (bool), source (one of 'required' / 'forbidden' / 'ci-test').

iteration_log list of dict

Per-iteration structured trace of which edges were proposed, validated, demoted.

skeleton DataFrame

Final undirected adjacency matrix (variables x variables).

cpdag DataFrame

Final CPDAG adjacency matrix.

variables list of str
n_obs int
alpha float
converged bool

True if the loop stopped early because no edges were demoted in the most recent iteration.

provenance dict

to_dag

to_dag()

Convert the final CPDAG into a :class:statspai.dag.DAG.

DAGValidationResult dataclass

Output of :func:llm_dag_validate.

LLMConfigurationError

Bases: RuntimeError

Raised when no LLM provider can be resolved.

Message points the user at concrete remediation steps rather than the generic "no key found" — agents and CLI users alike should know exactly what to type next.

llm_dag_propose

llm_dag_propose(variables: List[str], domain: str = '', client: Optional[Any] = None, seed: int = 0) -> LLMDAGProposal

Propose a candidate DAG from variable names + domain description.

Parameters:

Name Type Description Default
variables list of str

Names of variables in the dataset.

required
domain str

Free-text domain description (e.g. "labor economics, education and earnings"). Helps the LLM but ignored by the heuristic backend.

''
client object

An LLM client implementing .complete(prompt: str) -> str. If None, use the deterministic heuristic backend.

None
seed int
0

Returns:

Type Description
LLMDAGProposal

llm_unobserved_confounders

llm_unobserved_confounders(treatment: str, outcome: str, domain: str = 'health', client: Optional[Any] = None, point_estimate_rr: float = 1.5) -> UnobservedConfounderProposal

Enumerate plausible unobserved confounders for a study.

Parameters:

Name Type Description Default
treatment str

Free-text descriptions (used by LLM, ignored by heuristic).

required
outcome str

Free-text descriptions (used by LLM, ignored by heuristic).

required
domain ('health', 'education', 'labor', 'policy', 'marketing')
'health'
client object

LLM client with .complete(prompt: str) -> str.

None
point_estimate_rr float

Observed risk ratio; suggested E-values are scaled relative to this so the user can read "to nullify a RR of X you'd need an unobserved RR of Y".

1.5

Returns:

Type Description
UnobservedConfounderProposal

llm_sensitivity_priors

llm_sensitivity_priors(treatment: str, outcome: str, domain: str = 'health', client: Optional[Any] = None) -> SensitivityPriorProposal

Propose sensitivity-analysis priors for the substantive setting.

Parameters:

Name Type Description Default
treatment str
required
outcome str
required
domain ('health', 'education', 'labor', 'policy', 'marketing')
'health'
client object

LLM client with .complete(prompt: str) -> str.

None

Returns:

Type Description
SensitivityPriorProposal

openai_client

openai_client(model: str = 'gpt-4o-mini', *, api_key: Optional[str] = None, base_url: Optional[str] = None, organization: Optional[str] = None, temperature: float = 0.0, max_tokens: int = 1024, max_retries: int = 3, system_prompt: Optional[str] = None) -> LLMClient

Construct an OpenAI-compatible :class:LLMClient.

Requires the optional openai>=1.0 extra. Accepts any base_url override so you can point this at an OpenAI-compatible endpoint (Azure OpenAI, vLLM, Ollama's OpenAI-compat mode, ...).

Examples:

>>> import statspai as sp
>>> client = sp.causal_llm.openai_client(
...     model="gpt-4o-mini", api_key="sk-...",
... )
>>> res = sp.causal_llm.causal_mas(
...     variables=["age","treatment","outcome"], client=client,
... )

anthropic_client

anthropic_client(model: str = 'claude-opus-4-7', *, api_key: Optional[str] = None, base_url: Optional[str] = None, temperature: float = 0.0, max_tokens: int = 1024, max_retries: int = 3, thinking_budget: int = 0, system_prompt: Optional[str] = None) -> LLMClient

Construct an Anthropic-compatible :class:LLMClient.

Requires the optional anthropic>=0.30 extra. Defaults to Claude Opus 4.7 (the latest generally-available model as of StatsPAI v1.4).

Parameters:

Name Type Description Default
thinking_budget int

Enable Claude extended thinking with this many reasoning tokens. Values >= 1024 activate the feature; 0 disables it (the legacy behaviour). The budget is consumed from max_tokens, so set max_tokens comfortably above thinking_budget + expected_answer_tokens. The reasoning trace is captured on client.history[-1]['thinking'] for auditability but is not returned as part of the chat response.

0

Examples:

>>> import statspai as sp
>>> client = sp.causal_llm.anthropic_client(
...     model="claude-opus-4-7",
...     thinking_budget=4096,
...     max_tokens=8192,
... )

echo_client

echo_client(response_fn: Callable[[str, str], str]) -> LLMClient

Deterministic scripted-response client for testing.

import statspai as sp def scripted(role, prompt): ... if role == 'proposer': ... return 'age -> treatment\ntreatment -> outcome' ... return '' client = sp.causal_llm.echo_client(scripted) res = sp.causal_llm.causal_mas( ... variables=['age','treatment','outcome'], client=client, ... ) ('treatment', 'outcome') in res.edges True

llm_dag_constrained

llm_dag_constrained(data: DataFrame, variables: Optional[Sequence[str]] = None, descriptions: Optional[Dict[str, str]] = None, *, oracle: Optional[Callable[[Sequence[str], Dict[str, str]], Any]] = None, alpha: float = 0.05, ci_test: str = 'fisherz', max_iter: int = 3, high_conf_threshold: float = 0.7, low_conf_threshold: float = 0.3, forbid_low_conf: bool = False, verbose: bool = False) -> LLMConstrainedDAGResult

Closed-loop LLM-assisted causal discovery.

Iterate propose → constrain → CI-validate → demote until the proposed required-edge set stops shrinking or max_iter is hit.

Parameters:

Name Type Description Default
data DataFrame

Observational data.

required
variables sequence of str

Subset of columns to include in the discovery. Defaults to all numeric columns of data.

None
descriptions dict

Variable name -> human-readable description (passed to the oracle).

None
oracle callable

Function f(variables, descriptions) -> list[(from, to[, confidence])]. When omitted, the loop falls back to plain PC discovery (data-only) and returns a single-iteration result with no LLM scores.

None
alpha float

CI-test significance level for both PC and the validation pass.

0.05
ci_test 'fisherz'

Conditional independence test.

'fisherz'
max_iter int

Upper bound on the number of propose-validate cycles.

3
high_conf_threshold float

Minimum LLM confidence to inject the edge as a required background-knowledge constraint into PC.

0.7
low_conf_threshold float

Maximum LLM confidence below which the edge is treated as a forbidden candidate (only when forbid_low_conf=True).

0.3
forbid_low_conf bool

When True, low-confidence edges are forbidden in the PC skeleton instead of being passed through as plain candidates. Off by default — most LLMs return only positive edges and we don't want to over-prune.

False
verbose bool

Print per-iteration progress.

False

Returns:

Type Description
LLMConstrainedDAGResult

Examples:

>>> import statspai as sp
>>> def echo_oracle(vars_, desc):
...     return [('X', 'Y', 0.95), ('Z', 'X', 0.9)]
>>> r = sp.llm_dag_constrained(df, variables=['X', 'Y', 'Z'],
...                            oracle=echo_oracle, max_iter=3)
>>> r.summary()

llm_dag_validate

llm_dag_validate(dag, data: DataFrame, *, alpha: float = 0.05, ci_test: str = 'fisherz') -> DAGValidationResult

Per-edge CI-test validation of a declared DAG.

For every directed edge a -> b in dag, run a partial- correlation CI test of a ⟂ b | parents(b) \ {a}. Edges with p-value <= alpha are supported (the data did not provide evidence to remove them); edges with p-value > alpha are unsupported (the data is consistent with the conditional independence implied by removing the edge).

Parameters:

Name Type Description Default
dag statspai.dag.DAG or object exposing an ``edges`` attribute

Declared causal graph. Latent _L_* nodes are ignored.

required
data DataFrame
required
alpha float
0.05
ci_test 'fisherz'
'fisherz'

Returns:

Type Description
DAGValidationResult

get_llm_client

get_llm_client(*, client: Any = None, provider: Optional[str] = None, model: Optional[str] = None, api_key: Optional[str] = None, allow_interactive: bool = True, config_path=None, **kwargs: Any)

Resolve an :class:LLMClient via layered fallback.

See module docstring for the full resolution order. Most users don't call this directly — it's plumbed into sp.paper(..., llm='auto') and the LLM-DAG closed-loop entry points so the typical workflow is "set env var, forget".

Parameters:

Name Type Description Default
client LLMClient

Already-built client. Returned as-is. (Layer 1.)

None
provider ('anthropic', 'openai')

Force a specific provider. (Layer 2.)

'anthropic'
model str

Force a specific model. Defaults to DEFAULT_MODELS[provider].

None
api_key str

Pass-through to the constructed client. When omitted, the provider SDK reads from its standard env var (ANTHROPIC_API_KEY / OPENAI_API_KEY).

None
allow_interactive bool

Whether to fall back to a stdin prompt when prior layers fail. Set to False in agent / Jupyter contexts where input() would hang the kernel.

True
config_path Path

Override the config file location (mainly for testing).

None
**kwargs Any

Forwarded to the provider's client constructor (e.g. temperature, max_tokens, thinking_budget for Anthropic, base_url for OpenAI).

{}

Raises:

Type Description
LLMConfigurationError

When no path resolves and either allow_interactive=False or stdin is not a TTY (so interactive can't run).

list_available_providers

list_available_providers() -> Dict[str, Dict[str, Any]]

Inspect the environment and return what's currently usable.

Returns:

Type Description
dict

{provider_name: {"available": bool, "default_model": str, "env_var": str}} for each known provider. Useful both for the interactive prompt and for tools that want to surface available LLMs to the user.

configure_llm

configure_llm(*, provider: Optional[str] = None, model: Optional[str] = None, config_path=None)

Persist a provider+model preference to ~/.config/statspai/llm.toml.

Use this for a "set once, forget" workflow when working on a machine with both ANTHROPIC_API_KEY and OPENAI_API_KEY set — without it, the resolver tie-breaks to Anthropic.

Examples:

>>> import statspai as sp
>>> sp.causal_llm.configure_llm(
...     provider="openai", model="gpt-4o",
... )

llm_config_path

llm_config_path() -> Path

Return the platform-appropriate config file path.

Honours XDG_CONFIG_HOME on Linux/macOS; falls back to ~/.config if unset. Windows uses %APPDATA%.

load_llm_config

load_llm_config(path: Optional[Path] = None) -> Dict[str, Any]

Read the TOML preferences file, or return an empty dict.

Never raises on missing / malformed file — returns {} so callers can fall through cleanly to the next resolution layer.