`statspai.output`¶

output ¶

Output utilities for regression and causal-inference results.

The package is organised by purpose:

Regression-table renderers (4 entry points historically — see PR-B design doc; regtable is the canonical one):

:func:regtable — canonical multi-model regression table renderer. Supports text / HTML / LaTeX / Markdown / Quarto / Excel / Word / DataFrame, journal templates, multi-row SE, repro provenance.
:func:esttab — Stata estout / esttab clone (use eststo to register, then esttab to print). Thin Stata-flavoured surface.
:func:modelsummary — R modelsummary clone (functional API).
:func:outreg2 / :class:OutReg2 — Stata outreg2 clone (Excel-first surface).

Single-table helpers:

:func:tab — Stata-style tabulate.
:func:sumstats — descriptive summary statistics.
:func:balance_table — covariate-balance table.
:func:mean_comparison — two-group mean comparison with t-test / ranksum / chi2 (lives in mean_comparison.py since v1.6.x — re-exported from regression_table for back-compat).

Multi-table / paper bundles:

:func:paper_tables — Main / Heterogeneity / Robustness panels.
:class:Collection / :func:collect — narrative document builder.

Plotting:

:func:coefplot — coefficient plot.

Provenance / replication / citations:

:class:Provenance, :func:attach_provenance, :func:get_provenance, :func:compute_data_hash, :func:format_provenance, :func:lineage_summary.
:class:ReplicationPack, :func:replication_pack.
:func:cite, :data:CSL_REGISTRY, :func:csl_url, ...

Adapters:

:func:to_gt, :func:is_great_tables_available — great_tables adapter (lazy).

RegtableResult ¶

Rich result object for regression tables with multi-format export.

Returned by :func:regtable. Bundles one or more fitted models into a publication-style coefficient table and renders to text, LaTeX, HTML, Markdown / Quarto, a :class:pandas.DataFrame, Excel, or Word.

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> r1 = sp.regress("log_wage ~ education", data=df)
>>> r2 = sp.regress("log_wage ~ education + experience", data=df)
>>> table = sp.regtable(r1, r2)
>>> type(table).__name__
'RegtableResult'
>>> bool("education" in table.to_text().lower())
True
>>> bool(table.to_latex().strip())
True

summary ¶

summary() -> str

Return the configured human-readable regression table render.

RegtableResult already exposed explicit renderers such as :meth:to_text, :meth:to_latex, and :meth:to_markdown. Providing a standard summary() surface makes the object behave like the rest of StatsPAI's result containers without changing any rendering semantics.

to_latex ¶

to_latex(*, siunitx: bool = False, threeparttable: bool = False, siunitx_preamble: bool = False) -> str

Render the table as a LaTeX table float.

Parameters:

Name	Type	Description	Default
`siunitx`	`bool`	Decimal-align the numeric columns with `siunitx` `S` columns (journal style): coefficients align on the decimal point and significance stars ride along as `\textsuperscript`. Requires `\usepackage{siunitx}` (v3). Not supported together with `multi_se` / `eform` / `apply_coef` / cell templates / `column_spanners` (raises `NotImplementedError`).	`False`
`threeparttable`	`bool`	Wrap the table in `threeparttable` and move the footnotes into a `tablenotes` block (requires `\usepackage{threeparttable}`).	`False`
`siunitx_preamble`	`bool`	Prepend a comment line listing the required `\usepackage` lines.	`False`

to_markdown ¶

to_markdown(*, quarto: bool = False) -> str

Render the table as Markdown.

Parameters:

Name	Type	Description	Default
`quarto`	`bool`	When `True`, append a Quarto cross-reference caption block of the form `: <caption> {#tbl-<label>}` so the table can be referenced via `@tbl-<label>` in the manuscript. Requires `quarto_label` to have been set on the result (typically via `regtable(..., quarto_label="main")`). Equivalent to calling :meth:`to_quarto`.	`False`

to_quarto ¶

to_quarto() -> str

Render as a Quarto-cross-referenceable Markdown table.

Builds on :meth:to_markdown and appends a Quarto caption block of the form::

: <caption> {#tbl-<label>}

which lets the manuscript reference the table via @tbl-<label>. The tbl- prefix is auto-prepended when the user passes a bare quarto_label="main".

Behaviour

quarto_label is required. Without it, ValueError is raised — Quarto cross-references need an id.
quarto_caption falls back to title when not provided. If neither is set, a generic "Regression results" is used and a warning is emitted.
The leading title line is dropped (the caption block replaces it) to avoid duplicating the heading.

to_dataframe ¶

to_dataframe() -> DataFrame

Return the table as a pandas DataFrame.

to_dict ¶

to_dict(*, renders: Union[bool, Sequence[str], None] = None) -> Dict[str, Any]

Return a JSON-safe dict representation of the table.

The package is agent-native (CLAUDE.md §1): a rendered regression table is a first-class artifact an LLM tool loop should be able to serialise, cache, and reason over without re-rendering. The payload carries three layers:

metadata — model_labels, dep_var_labels, panel_labels, title, notes, template, se_type, stars / star_levels, requested_stats, coef_labels.
table — the rendered cell grid (the same strings :meth:to_dataframe produces: "2.067***", "(0.074)" …), as a list of {"term": ..., <model label>: <cell>} records.
models — the numeric truth per model: coefficient estimates, standard errors, t / p values, confidence bounds, summary stats and the dependent variable. Use this layer for machine reasoning; use table for faithful re-display.

Parameters:

Name	Type	Description	Default
`renders`	`bool \| sequence of str`	When truthy, also embed fully rendered strings under `"renders"`. `True` embeds `latex` / `html` / `markdown` / `text`; a sequence selects specific formats (e.g. `renders=["latex"]`). Default `None` keeps the payload compact.	`None`

Returns:

Type	Description
`dict`	JSON-safe; round-trips through `json.dumps`.

Examples:

>>> import statspai as sp
>>> tbl = sp.regtable(m1, m2, template="aer")
>>> payload = tbl.to_dict()
>>> payload["models"][0]["coefficients"]["x"]["estimate"]
2.067

to_json ¶

to_json(*, indent: Optional[int] = None, renders: Union[bool, Sequence[str], None] = None) -> str

Serialise :meth:to_dict via json.dumps.

to_agent_summary ¶

to_agent_summary(*, max_rows: int = 12, max_terms: int = 8) -> Dict[str, Any]

Return a compact JSON-ready summary for agent/tool workflows.

The full :meth:to_dict payload is lossless and can be large. This method keeps the decision-critical metadata, the first rendered rows, and a bounded coefficient slice per model so agents can inspect a regression table without pulling every rendered/export format.

from_dict `classmethod` ¶

from_dict(payload: Dict[str, Any]) -> 'RegtableResult'

Reconstruct a :class:RegtableResult from a :meth:to_dict payload.

The inverse of :meth:to_dict. Rebuilds one normalised model per entry in the models layer (coefficient estimates, SE, t, p, CI, summary stats, dependent variable), re-splits them into panels per render_spec.panel_sizes, and restores the render-controlling metadata (labels, fmt, se_type, stars / star levels, stats, keep / drop / order, add_rows, alpha, template). For a table built without exotic options this round-trips exactly:

import statspai as sp t = sp.regtable(m1, m2, template="aer") # doctest: +SKIP RegtableResult.from_dict(t.to_dict()).to_latex() == t.to_latex() True

Notes

Exotic features that are not part of the to_dict payload — stacked multi_se rows, eform transforms, column_spanners, tests rows, and custom apply_coef transforms — are NOT preserved across the round-trip (they reconstruct as a plain table). The serialised table / renders layers already capture their rendered form if you only need to re-display, not re-compute.

to_excel ¶

to_excel(filename: str) -> None

Export table to Excel as a strict book-tab three-line table.

Uses the shared _excel_style primitives so the visual output is byte-aligned with sumstats, tab, paper_tables, collection, modelsummary and outreg2: thick top rule above the column header, thin mid rule between header and body, thick bottom rule below the last data row, Times New Roman throughout.

to_word ¶

to_word(filename: str) -> None

Export table to Word (.docx) file in AER/QJE book-tab style.

The exported document follows economics-journal conventions: a heavy top rule, thin mid rule below the header, heavy bottom rule above notes, and no internal vertical borders. Body text is Times New Roman 10pt; the notes paragraph is 8pt italic.

to_docx ¶

to_docx(filename: str) -> None

Alias for :meth:to_word — mirrors Stata outreg2 convention.

save ¶

save(filename: str) -> None

Auto-detect format from file extension and save.

EstimateTableResult ¶

Stata esttab result handle — thin wrapper over a :class:RegtableResult.

Preserves the EstimateTableResult type identity for callers that do isinstance(x, EstimateTableResult). Forwards every render method to the underlying regtable result; adds to_csv() for parity with the legacy esttab API (regtable does not natively expose CSV but the dataframe path is byte-identical to what the legacy esttab produced).

MeanComparisonResult ¶

Rich result object for balance / mean comparison tables.

Returned by :func:mean_comparison. Holds per-variable group means, standard deviations, the mean difference, and the test p-value, and renders to text, a :class:pandas.DataFrame, LaTeX, HTML, or Markdown.

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> result = sp.mean_comparison(
...     df,
...     variables=["education", "experience", "log_wage"],
...     group="female",
... )
>>> type(result).__name__
'MeanComparisonResult'
>>> table = result.to_dataframe()
>>> bool("p-value" in table.columns)
True

to_excel ¶

to_excel(filename: str) -> None

Export balance table to Excel as a book-tab three-line table.

to_word ¶

to_word(filename: str) -> None

Export balance table to Word in AER/QJE book-tab style.

save ¶

save(filename: str) -> None

Auto-detect format from extension and save.

PaperTables `dataclass` ¶

Container for the multi-panel paper-table bundle.

Each attribute is a RegtableResult (has .text, .latex, .html, .to_latex(), ...). Iterate via .panels() or access by name: pt.main / pt.heterogeneity / pt.robustness / pt.placebo.

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> r1 = sp.regress("log_wage ~ education", data=df)
>>> r2 = sp.regress("log_wage ~ education + experience", data=df)
>>> pt = sp.paper_tables(main=[r1, r2], template="aer")
>>> type(pt).__name__
'PaperTables'
>>> sorted(pt.panels().keys())
['main']
>>> latex = pt.to_latex()
>>> bool(latex.strip())
True

panels ¶

panels() -> Dict[str, RegtableResult]

Return {name: RegtableResult} for every populated panel.

to_latex ¶

to_latex(path: Optional[str] = None) -> str

Concatenate every panel's LaTeX into a single file.

If path is provided, also write to disk.

to_markdown ¶

to_markdown(path: Optional[str] = None) -> str

Return all panels stacked as GitHub-flavoured Markdown.

to_text ¶

to_text() -> str

Return all panels stacked as plain text (terminal-friendly).

to_dict ¶

to_dict() -> Dict[str, Any]

Return a JSON-safe dict representation of the multi-panel bundle.

Agent-native counterpart to the renderers: each populated panel (main / heterogeneity / robustness / placebo) carries the full :meth:RegtableResult.to_dict payload (metadata + rendered grid + numeric truth), so an LLM tool loop can cache and reason over the whole bundle without re-rendering.

to_json ¶

to_json(*, indent: Optional[int] = None) -> str

Serialise :meth:to_dict via json.dumps.

to_docx ¶

to_docx(path: str) -> str

Write every populated panel into a single .docx file.

Each panel renders as an AER/QJE book-tab table (heavy top rule, thin mid rule, heavy bottom rule, no internal borders). Panels are separated by a page break so the file lands on the co-author's desk ready for direct insertion into a manuscript.

Returns the file path that was written.

to_xlsx ¶

to_xlsx(path: str) -> str

Write every populated panel as a separate sheet in one workbook.

Sheet names are the panel names (main / heterogeneity / robustness / placebo). Each sheet uses the AER book-tab style: heavy rules above the header, below the header, and below the last data row.

Returns the file path that was written.

Collection ¶

A named, ordered bundle of tables and prose for a single document.

Add items in any order via the add_* methods (each returns self so calls can be chained). Render to any supported format via save(path) or one of the explicit to_* methods.

Parameters:

Name	Type	Description	Default
`title`	`str`	Displayed at the top of the rendered document.	`None`
`template`	`('aer', 'qje', 'econometrica', 'restat')`	Forwarded to `paper_tables` style; also drives the default star levels used by `add_regression`.	`'aer'`

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> m1 = sp.regress("log_wage ~ education", data=df)
>>> m2 = sp.regress("log_wage ~ education + experience", data=df)
>>> c = sp.Collection(title="Wage analysis", template="aer")
>>> _ = c.add_regression(m1, m2, name="main", title="Table 1")
>>> _ = c.add_summary(df, vars=["log_wage", "education"], title="Table 2")
>>> len(c)
2
>>> "Table 1" in c.to_markdown()
True

list ¶

list() -> DataFrame

Return a DataFrame summary (name / kind / title) for inspection.

to_frame ¶

to_frame(*, include_text: bool = False) -> DataFrame

Return a semantic long-format view of the collection.

This is the programmatic counterpart to Stata's collect layout and R's modelsummary/gt data pipeline: every rendered cell is represented as one row with stable dimensions item / kind / term / statistic / model and both a raw numeric value (when parseable) and the display formatted string.

Parameters:

Name	Type	Description	Default
`include_text`	`bool`	Include free-form text and headings as rows. Table workflows usually leave this off; document-audit workflows may want it.	`False`

Returns:

Type	Description
`DataFrame`	Long-format cell table with provenance-friendly dimensions.

to_csv ¶

to_csv(path: Optional[str] = None, **kwargs: Any) -> str

Render :meth:to_frame to CSV; optionally write to path.

to_dict ¶

to_dict() -> Dict[str, Any]

Return a JSON-safe dict representation of the whole document.

Agent-native counterpart to :meth:save — every item (regression table, balance / summary table, free text) is serialised in order so an LLM tool loop can cache and reason over a multi-table document without re-rendering. Regression-table items carry the full :meth:RegtableResult.to_dict payload (metadata + rendered grid + numeric truth).

to_json ¶

to_json(*, indent: Optional[int] = None) -> str

Serialise :meth:to_dict via json.dumps.

add_regression ¶

add_regression(*results: Any, name: Optional[str] = None, title: Optional[str] = None, **regtable_kwargs: Any) -> 'Collection'

Add a regression table built from one or more model results.

regtable_kwargs are forwarded verbatim to sp.regtable.

add_table ¶

add_table(result: Any, *, name: Optional[str] = None, title: Optional[str] = None) -> 'Collection'

Add an already-built RegtableResult / MeanComparisonResult.

add_summary ¶

add_summary(data: DataFrame, vars: Optional[Sequence[str]] = None, *, stats: Optional[Sequence[str]] = None, name: Optional[str] = None, title: Optional[str] = None, labels: Optional[Dict[str, str]] = None) -> 'Collection'

Add a descriptive-statistics table built from a DataFrame.

Stores the underlying DataFrame; rendering re-uses the existing sumstats formatters so the AER book-tab style applies.

add_balance ¶

add_balance(data: DataFrame, treatment: str, variables: Sequence[str], *, weights: Optional[str] = None, test: str = 'ttest', name: Optional[str] = None, title: Optional[str] = None, fmt: str = '%.3f') -> 'Collection'

Add a treatment vs. control balance table (calls mean_comparison).

add_text ¶

add_text(text: str, *, name: Optional[str] = None, title: Optional[str] = None) -> 'Collection'

Add a free-form text block (rendered as a paragraph).

add_heading ¶

add_heading(text: str, *, level: int = 2, name: Optional[str] = None) -> 'Collection'

Add a section heading (level 1-3).

to_text ¶

to_text() -> str

Plain-text rendering of every item, top to bottom.

to_markdown ¶

to_markdown(path: Optional[str] = None) -> str

Render to GitHub-flavoured Markdown; optionally write to path.

to_html ¶

to_html(path: Optional[str] = None) -> str

Render to a single self-contained HTML document.

to_latex ¶

to_latex(path: Optional[str] = None) -> str

Concatenate every item's LaTeX into one .tex file.

to_docx ¶

to_docx(path: str) -> str

Write the entire collection to a single .docx file.

Each item renders in turn — headings as Word headings, text as paragraphs, tables in AER book-tab style — separated by page breaks between tables.

to_xlsx ¶

to_xlsx(path: str) -> str

Write the collection to a single workbook (one sheet per item).

save ¶

save(path: str) -> str

Auto-detect format from path extension and write.

CollectionItem `dataclass` ¶

One entry in a :class:Collection.

You rarely construct this directly; the Collection.add_* methods append items for you, and you read them back via Collection.get or by iterating the collection.

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> c = sp.collect("demo")
>>> _ = c.add_summary(df, vars=["log_wage"], name="desc", title="Summary")
>>> item = c.get("desc")
>>> isinstance(item, sp.CollectionItem)
True
>>> item.name, item.kind, item.title
('desc', 'summary', 'Summary')

Provenance `dataclass` ¶

A traceable record of how a single estimate was produced.

Attributes:

Name	Type	Description
`function`	`str`	Fully qualified function name (e.g. `"statspai.did.callaway_santanna"`).
`params`	`dict`	JSON-serialisable summary of the call arguments.
`data_hash`	`str or None`	12-char SHA-256 prefix of the input data, or None when the data was too large to hash (>1M rows) or wasn't a recognised type.
`data_shape`	`list[int] or None`	`[n_rows, n_cols]` of the input frame, when known.
`run_id`	`str`	Per-call uuid4 — disambiguates two structurally identical calls in the same session.
`statspai_version`	`str`	Package version at the time of the call.
`python_version`	`str`	`"3.11.5"`-style.
`timestamp`	`str`	ISO-8601 wall-clock of when the call returned.

Examples:

>>> import statspai as sp
>>> prov = sp.Provenance(function="sp.did.callaway_santanna",
...                      params={"method": "dr"})
>>> prov.function
'sp.did.callaway_santanna'
>>> sorted(prov.to_dict())  # JSON-able view
['data_hash', 'data_shape', 'function', 'params', 'python_version', 'run_id', 'statspai_version', 'timestamp']
>>> prov.short().startswith("sp.did.callaway_santanna")
True

to_dict ¶

to_dict() -> Dict[str, Any]

Plain-dict view, suitable for JSON dumping.

short ¶

short() -> str

One-line human summary.

ReplicationPack ¶

Lightweight summary returned by :func:replication_pack.

Mostly for testing / programmatic inspection; users typically just care about output_path.

Examples:

>>> import os, tempfile
>>> import statspai as sp
>>> df = sp.cps_wage()
>>> res = sp.regress("log_wage ~ education + experience", data=df)
>>> out = os.path.join(tempfile.mkdtemp(), "reppack")
>>> pack = sp.replication_pack(res, out, env=False, bib=False,
...                            include_git_sha=False)
>>> type(pack).__name__
'ReplicationPack'
>>> bool(os.path.exists(pack.output_path))
True
>>> bool(len(pack.manifest.get("files", [])) > 0)
True

regtable ¶

regtable(*args: Any, panel_labels: Optional[List[str]] = None, coef_labels: Optional[Dict[str, str]] = None, dep_var_labels: Optional[List[str]] = None, model_labels: Optional[List[str]] = None, keep: Optional[Sequence[str]] = None, drop: Optional[Sequence[str]] = None, order: Optional[Sequence[str]] = None, stats: Optional[Sequence[str]] = None, se_type: str = 'se', stars: bool = True, star_levels: Optional[Tuple[float, ...]] = None, fmt: str = '%.3f', output: str = 'text', filename: Optional[str] = None, title: Optional[str] = None, notes: Optional[List[str]] = None, add_rows: Optional[Dict[str, List[str]]] = None, alpha: float = 0.05, template: Optional[str] = None, diagnostics: Union[str, bool] = 'auto', multi_se: Optional[Dict[str, Sequence[Any]]] = None, repro: Union[bool, Dict[str, Any], None] = None, quarto_label: Optional[str] = None, quarto_caption: Optional[str] = None, eform: Union[bool, Sequence[bool]] = False, column_spanners: Optional[Sequence[Tuple[str, int]]] = None, coef_map: Optional[Dict[str, str]] = None, consistency_check: bool = True, estimate: Optional[str] = None, statistic: Optional[str] = None, notation: Union[str, Tuple[str, ...]] = 'stars', apply_coef: Optional[Any] = None, apply_coef_deriv: Optional[Any] = None, escape: bool = True, tests: Optional[Dict[str, Sequence[Any]]] = None, fixef_sizes: bool = False, vcov: Optional[str] = None, transpose: bool = False) -> RegtableResult

Unified publication-quality regression table.

Accepts model results as positional arguments. If the first argument is a list, each list is treated as a separate panel.

Parameters:

Name	Type	Description	Default
`*args`	`model results or lists of model results`	`EconometricResults`, `CausalResult`, or any duck-typed object with `params` / `std_errors` attributes. Pass multiple lists to create a multi-panel table.	`()`
`panel_labels`	`list of str`	Labels for each panel (e.g., `["Panel A: Wages", "Panel B: Hours"]`).	`None`
`coef_labels`	`dict`	Rename variables: `{"education": "Years of Education"}`.	`None`
`dep_var_labels`	`list of str`	Dependent variable labels shown below column headers.	`None`
`model_labels`	`list of str`	Column header labels. Defaults to `(1), (2), ...`.	`None`
`keep`	`list of str`	Only show these variables.	`None`
`drop`	`list of str`	Hide these variables.	`None`
`order`	`list of str`	Reorder variables.	`None`
`stats`	`list of str`	Summary statistics. Defaults to `["N", "R2", "adj_R2", "F"]`.	`None`
`se_type`	`str`	What to show beneath coefficients: `"se"`, `"t"`, `"p"`, or `"ci"` for confidence intervals.	``"se"``
`stars`	`bool`	Append significance stars.	`True`
`star_levels`	`tuple`	Thresholds for ``, ``, `**`.	``(0.10, 0.05, 0.01)``
`fmt`	`str`	Format string for numeric values. Pass any C-style format (`"%.0f"`, `"%.4f"`, ...) for fixed precision, or `"auto"` for magnitude-adaptive precision (recommended when a single table mixes dollar-magnitude coefficients like `1521` with elasticity-magnitude coefficients like `0.288` — fixed `"%.0f"` would round the latter to `0`).	``"%.3f"``
`output`	`str`	Controls what `str(result)` / `repr(result)` / `print(result)` returns — one of `"text"`, `"latex"`, `"html"`, `"markdown"`, `"quarto"`, `"word"`, `"excel"`. In Jupyter, `_repr_html_` always renders HTML regardless of this setting.	``"text"``
`filename`	`str`	Save the table to this file path. The format is chosen from the file extension (`.tex`/`.html`/`.md`/`.qmd`/`.docx`/ `.xlsx`/`.csv`/`.json`), independently of `output=`. The `.json` form writes the agent-native :meth:`RegtableResult.to_dict` payload. Pass a matching extension and `output=` to avoid surprises.	`None`
`quarto_label`	`str`	Quarto cross-reference id. Pass `"main"` to make the table referenceable as `@tbl-main` from the manuscript prose. The `tbl-` prefix is auto-prepended when missing. Required for `to_quarto()` and `output="quarto"`.	`None`
`quarto_caption`	`str`	Caption rendered alongside the Quarto cross-ref id. Falls back to `title` when omitted; if both are absent, a generic `"Regression results"` is used and a warning is emitted.	`None`
`title`	`str`	Table title / caption.	`None`
`notes`	`list of str`	Additional notes beneath the table.	`None`
`add_rows`	`dict`	Custom rows: `{"Controls": ["No", "Yes", "Yes"]}`. User-provided rows take precedence over auto-extracted diagnostic rows with the same label.	`None`
`alpha`	`float`	Significance level used when `se_type='ci'`. Displayed CI is `(1 - alpha) * 100`%. With `alpha=0.05` (default) the bounds come from the model's stored 95% CI; for any other `alpha` the bounds are recomputed as `b ± crit · se`, using the t-distribution when `df_resid` is known, else the standard normal.	`0.05`
`template`	`str`	Journal preset name. One of `"aer"`, `"qje"`, `"econometrica"`, `"restat"`, `"jf"`, `"aeja"`, `"jpe"`, `"restud"`. When set, fills in defaults for `star_levels`, the SE-row footer label (e.g. QJE → "Robust standard errors"), the default `stats` selection (e.g. JF/AEJA include Adj. R²), and any extra notes — but every explicit kwarg you pass still wins. See :data:`statspai.output._journals.JOURNALS`.	`None`
`diagnostics`	`('auto', 'off')`	Auto-extract publication-quality diagnostic rows from the result objects: FE / Cluster indicators — one row per distinct fixed effect variable (AER style: `"Firm FE: Yes/No"`, `"Year FE: Yes/No"`; interactions render as `"Firm × Year FE"`), plus `"Cluster SE: <var>"`. Falls back to a single `"Fixed Effects: Yes/No"` row when FE metadata is present but unparseable. IV — first-stage F (Olea-Pflueger / KP), Hansen-J p. DiD — pre-trend p-value, treated-group count. RD — bandwidth, kernel, polynomial order. `"auto"` (and `True`) emit only rows where at least one column produces a non-empty cell; `False` / `"off"` disables all auto-extraction. User-supplied `add_rows` always override.	`'auto'`
`multi_se`	`dict`	Stack additional SE specifications under the primary SE row. Keys are display labels (e.g. `"Cluster SE"`, `"Bootstrap SE"`) and values are sequences of :class:`pandas.Series` or dicts (one per model column) mapping coefficient names to SE values. Bracket styles cycle `[]`/`{}`/`⟨⟩`/`\|\|`. Footer notes record each label automatically.	`None`
`repro`	`bool or dict`	Append a reproducibility metadata note (StatsPAI version, optional seed and data hash, timestamp) as the last footer line. `True` emits the version + timestamp only. Pass a dict to record more: `{"data": df, "seed": 42, "extra": "git@<sha>"}`.	`None`
`eform`	`bool or list of bool`	Report exponentiated coefficients — odds ratios for `logit` / `probit`, incidence-rate ratios for `poisson`, hazard ratios for Cox-style models. Standard errors use the delta method (`exp(b)·SE(b)`), CI bounds are `(exp(lo), exp(hi))` of the original endpoints, and t / p values are unchanged because `H_0: b=0` is equivalent to `H_0: exp(b)=1`. Pass a per-model list (length matches `n_models`) to mix transformed and untransformed columns (e.g. logit + OLS in the same table). A footer note transparently flags which columns are exponentiated.	``False``
`column_spanners`	`list of (label, span)`	Multi-row header above the model labels — each tuple groups `span` consecutive columns under `label`. Spans must partition all model columns (sum equals `n_models`). Renders as `\multicolumn{n}{c}{label}` + `\cmidrule` in LaTeX, `colspan="n"` in HTML, repeated bold cells in Markdown, and centered ASCII in text. Word and Excel exports inherit `to_dataframe()`'s flat column model and currently omit the spanner row — use the LaTeX or HTML output for paper-grade spanners. Mirrors Stata `mgroups()` and R `modelsummary`'s `group` argument. Example: `column_spanners=[("OLS", 2), ("IV", 2)]` over four models.	`None`
`coef_map`	`dict`	Single-shot rename + reorder + drop. Mirrors R `modelsummary`'s `coef_map`: pass an ordered dict whose keys are coefficient names to keep (in display order) and values are the rendered labels. Variables not in `coef_map` are dropped. Mutually exclusive with `coef_labels` / `keep` / `drop` / `order` — pass either the unified map or the legacy four-parameter spec.	`None`
`consistency_check`	`bool`	When two or more columns are passed and their sample sizes differ, emit a `UserWarning`. Reviewer red flag — disable by setting `False` (or annotate with `notes=[...]`) when the N-mismatch is intentional (IV first stage on a subsample, RD bandwidth restriction, etc.).	`True`
`estimate`	`str`	Custom format string for the top (coefficient) line in each cell. Mirrors R `modelsummary`'s `estimate=` argument. Placeholders: `{estimate}`, `{stars}`, `{std_error}`, `{t_value}`, `{p_value}`, `{conf_low}`, `{conf_high}`. Default `"{estimate}{stars}"`. Pass e.g. `"{stars}{estimate}"` for stars-first, or `"{estimate} ({std_error}){stars}"` for an inline single-line cell.	`None`
`statistic`	`str`	Custom format string for the bottom (statistic) line in each cell. Same placeholders as `estimate`. Default depends on `se_type`: `"({std_error})"` for `se`, `"[{conf_low}, {conf_high}]"` for `ci`, etc. Pass e.g. `"t={t_value}, p={p_value}"` for working-paper-style cells.	`None`
`notation`	``"stars"`` \| ``"symbols"`` \| tuple of 3 strings	Family of significance markers used when `stars=True`. `"stars"` (default) → `("", "", "**")`; `"symbols"` → `("†", "‡", "§")` (AER / JPE alternative when stars conflict with footnote markers); a 3-tuple of custom strings is accepted, ordered low → high.	`'stars'`
`apply_coef`	`callable`	Apply an arbitrary transformation `f(b)` to each rendered coefficient. Generalises `eform` (which is shorthand for `apply_coef=np.exp`). Useful for percentage transforms (`apply_coef=lambda b: 100*b`), log scales, or signed sqrt for distortion measures. Pair with `apply_coef_deriv` for delta-method SE rescaling. Mutually exclusive with `eform`.	`None`
`apply_coef_deriv`	`callable`	Derivative `f'(b)` of the `apply_coef` callable. When provided, SEs are rescaled as `\|f'(b)\| · SE(b)`. When omitted, SEs stay on the original scale and a footer warns the reader.	`None`
`escape`	`bool`	Auto-escape user-supplied label strings (`coef_labels`, `model_labels`, `panel_labels`, `dep_var_labels`, `column_spanners` labels, `add_rows` labels and values, `title`) for the active output format (LaTeX / HTML). Pass `escape=False` when those strings already contain raw markup you want to preserve verbatim — e.g. math-mode coefficient names like `"$\beta_1$"`, or HTML tags like `"<i>β</i>"`. Cell content (numeric estimates, computed stats) is unaffected; it never contains user-controlled metacharacters.	`True`
`tests`	`dict`	Render hypothesis-test rows in the diagnostic strip below the stats block. Keys are display labels ("F-test x1=0", "Hansen J p-value", "Wald χ²"); values are sequences whose length equals `n_models`. Each per-model entry can be: `(statistic, pvalue)` tuple → `"<stat>*"` (stars from p) bare `pvalue` float → `"<p>*"` `None` / `NaN` → empty cell any string → passed through as-is Stars honour the configured `notation` family for cross-table consistency. Closes the gap to Stata `estadd scalar` / `test` integration where reviewers expect Wald / Sargan / Hansen-J / first-stage F right under the main results block.	`None`
`fixef_sizes`	`bool`	Auto-emit "# Firm: 1,234" / "# Year: 30" rows showing the number of distinct levels per fixed effect. Reads `model_info['n_fe_levels']` from each result — currently populated by `count.py` (Poisson/NegBin) and the pyfixest adapter; other estimators silently no-op. Mirrors R fixest's `etable(..., fixef_sizes=TRUE)`.	`False`
`vcov`	`str`	Recompute the SE / t / p / 95% CI columns at print time using a different variance estimator. Currently supports OLS-style results that store `data_info['X']` and `data_info['residuals']`: `"HC0"` — White heteroskedasticity-robust `"HC1"` / `"robust"` — Stata's `robust` (HC0 × n/(n-k)) `"HC2"` — leverage-weighted `"HC3"` — leverage-squared (recommended for small samples; Long-Ervin) Columns whose underlying result lacks the X/residuals fields emit a `UserWarning` and retain their fit-time SEs, so a heterogeneous mix of OLS + non-OLS does not blow up — the warning lists the affected columns so the user can audit.	`None`
`transpose`	`bool`	Render with axes swapped: rows become models, columns become variables. Single-panel only; multi-panel input or `multi_se=` is rejected with `NotImplementedError` to keep the layout pivot semantics tight. Currently supports text and HTML renderers.	`False`

Returns:

Type	Description
`RegtableResult`	Object with `.to_text()`, `.to_latex()`, `.to_html()`, `.to_markdown()`, `.to_excel(filename)`, `.to_word(filename)`, `.to_dataframe()`, `.to_dict()` / `.to_json()` (agent-native), `.save(filename)` methods. Renders as rich HTML in Jupyter notebooks via `_repr_html_()`.

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> m1 = sp.regress("log_wage ~ education", data=df)
>>> m2 = sp.regress("log_wage ~ education + experience", data=df)
>>> table = sp.regtable(m1, m2)
>>> type(table).__name__
'RegtableResult'
>>> bool("education" in table.to_text().lower())
True
>>> sp.regtable(m1, m2, output="latex", filename="table1.tex")

Unified coef_map (rename + reorder + drop in one shot):

>>> table2 = sp.regtable(m1, m2, coef_map={
...     "experience": "Experience",
...     "education": "Years of schooling",
... })
>>> bool(table2.to_latex().strip())
True

Odds ratios via eform for a logit model:

>>> lg = sp.logit("union ~ education", data=df)
>>> type(sp.regtable(lg, eform=True)).__name__
'RegtableResult'

eststo ¶

eststo(result: Any, *, name: Optional[str] = None) -> None

Store a model result (like Stata's estimates store).

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> sp.estclear()
>>> sp.eststo(sp.regress("log_wage ~ education", data=df), name="(1)")
>>> sp.eststo(
...     sp.regress("log_wage ~ education + experience", data=df), name="(2)"
... )
>>> sp.estclear()

estclear ¶

estclear() -> None

Clear all stored model results.

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> sp.eststo(sp.regress("log_wage ~ education", data=df), name="(1)")
>>> sp.estclear()

esttab ¶

esttab(*results: Any, names: Optional[Sequence[str]] = None, se: bool = True, t: bool = False, p: bool = False, ci: bool = False, stars: bool = True, star_levels: Tuple[float, ...] = (0.1, 0.05, 0.01), keep: Optional[Sequence[str]] = None, drop: Optional[Sequence[str]] = None, order: Optional[Sequence[str]] = None, labels: Optional[Dict[str, str]] = None, stats: Optional[Sequence[str]] = None, fmt: str = '%.4f', output: str = 'text', filename: Optional[str] = None, title: Optional[str] = None, notes: Optional[Sequence[str]] = None, alpha: float = 0.05) -> EstimateTableResult

Stata-style esttab — thin facade over :func:sp.regtable.

.. deprecated:: Now a thin wrapper over :func:statspai.output.regtable. Use sp.regtable(*models, ...) directly for full control. See the module docstring for the parameter mapping.

Accepts model results directly as positional arguments or reads from the global store populated by :func:eststo. If both positional arguments and a non-empty store exist, positional arguments take precedence.

Parameters mirror the original esttab API; see the module docstring for the exact mapping to regtable.

Examples:

>>> import warnings
>>> import statspai as sp
>>> df = sp.cps_wage()
>>> r1 = sp.regress("log_wage ~ education", data=df)
>>> r2 = sp.regress("log_wage ~ education + experience", data=df)
>>> with warnings.catch_warnings():
...     warnings.simplefilter("ignore", DeprecationWarning)
...     tab = sp.esttab(r1, r2, output="text")
>>> tab.to_dataframe().shape[1]  # one column per model
2
>>> "education" in tab.to_text()
True

coefplot ¶

coefplot(*models: Any, model_names: Optional[List[str]] = None, variables: Optional[List[str]] = None, ax: Any = None, figsize: tuple = (8, 6), colors: Optional[List[str]] = None, title: Optional[str] = None, alpha: float = 0.05) -> Any

Forest plot comparing coefficients across models.

Parameters:

Name	Type	Description	Default
`*models`	`Any`	Model result objects.	`()`
`model_names`	`list of str`		`None`
`variables`	`list of str`	Which variables to plot. Default: all shared variables.	`None`
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(8, 6)`
`colors`	`list of str`		`None`
`title`	`str`		`None`
`alpha`	`float`	Significance level for CIs.	`0.05`

Returns:

Type	Description
`(fig, ax)`

Examples:

Forest plot of the slope on x across two nested models:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> df = pd.DataFrame({"x": rng.normal(0, 1, n), "z": rng.normal(0, 1, n)})
>>> df["y"] = 1.0 + 2.0 * df["x"] - 0.5 * df["z"] + rng.normal(0, 1, n)
>>> m1 = sp.regress("y ~ x", data=df)
>>> m2 = sp.regress("y ~ x + z", data=df)
>>> fig, ax = sp.coefplot(m1, m2, model_names=["M1", "M2"], variables=["x"])
>>> ax.get_xlabel()
'Coefficient Estimate'
>>> fig.savefig("coefplot.png")

coefplot_tikz ¶

coefplot_tikz(*models: Any, model_names: Optional[List[str]] = None, variables: Optional[List[str]] = None, coef_labels: Optional[Dict[str, str]] = None, level: float = 0.95, title: Optional[str] = None, xlabel: str = 'Coefficient estimate', standalone: bool = False) -> str

Return pgfplots / TikZ source for a coefficient forest plot.

The vector-graphics, LaTeX-native counterpart to :func:coefplot (which returns a Matplotlib (fig, ax) you can fig.savefig("plot.pdf") / .png). Each model becomes one \addplot series of point estimates with horizontal confidence-interval error bars; variables run down the y-axis with a dashed reference line at zero — the same forest layout as :func:coefplot, emitted as editable LaTeX.

Parameters:

Name	Type	Description	Default
`*models`	`Any`	Model result objects (`EconometricResults` / `CausalResult` / any object exposing `params` / `std_errors`).	`()`
`model_names`	`list of str`	Legend labels. Default `Model 1`, `Model 2`, …	`None`
`variables`	`list of str`	Which coefficients to plot. Default: every shared variable, sorted.	`None`
`coef_labels`	`dict`	Rename variables on the y-axis, e.g. `{"x": "Treatment"}`.	`None`
`level`	`float`	Confidence level for the error bars (normal approximation `b ± z·se`, matching :func:`coefplot`).	`0.95`
`title`	`str`	Plot title. Default `"Coefficient plot"`.	`None`
`xlabel`	`str`	x-axis label.	``"Coefficient estimate"``
`standalone`	`bool`	When `True`, wrap the `tikzpicture` in a compilable `standalone` document (with the required `\usepackage{pgfplots}`). Otherwise return just the `tikzpicture` to `\input` into a paper (needs `\usepackage{pgfplots}` in the preamble).	`False`

Returns:

Type	Description
`str`	`pgfplots` source.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> df = pd.DataFrame({"x": rng.normal(0, 1, n), "z": rng.normal(0, 1, n)})
>>> df["y"] = 1.0 + 2.0 * df["x"] - 0.5 * df["z"] + rng.normal(0, 1, n)
>>> m1 = sp.regress("y ~ x + z", data=df)
>>> tikz = sp.coefplot_tikz(m1, coef_labels={"x": "Treatment"})
>>> isinstance(tikz, str)
True
>>> open("coefplot.tex", "w").write(tikz)

balance_table ¶

balance_table(data: DataFrame, treat: str, covariates: List[str], output: str = 'text', title: str = 'Balance Table', fmt: str = '%.3f', labels: Optional[Dict[str, str]] = None, test: str = 'ttest') -> Union[str, DataFrame]

Generate a balance table comparing treated and control groups.

Standard Table 1 for matching, DID, and RCT papers.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data.	required
`treat`	`str`	Binary treatment variable (0/1).	required
`covariates`	`list of str`	Variables to check balance on.	required
`output`	`str`	'text', 'latex', 'html', 'dataframe', or filepath.	`'text'`
`title`	`str`	Table title.	`'Balance Table'`
`fmt`	`str`	Number format.	`'%.3f'`
`labels`	`dict`	Variable labels.	`None`
`test`	`str`	Test for difference: 'ttest' or 'ranksum'.	`'ttest'`

Returns:

Type	Description
`str or DataFrame`

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> bal = sp.balance_table(
...     df, treat='union',
...     covariates=['education', 'experience', 'tenure'],
...     output='dataframe',
... )
>>> import pandas as pd
>>> bool(isinstance(bal, pd.DataFrame))
True
>>> cov = ['education', 'experience', 'tenure']
>>> sp.balance_table(
...     df, treat='union', covariates=cov, output='balance.docx'
... )

collect ¶

collect(title: Optional[str] = None, *, template: str = 'aer') -> Collection

Construct a fresh :class:Collection.

Convenience factory mirroring Stata 15's collect workflow.

Parameters:

Name	Type	Description	Default
`title`	`str`	Document title shown at the top of the rendered output.	`None`
`template`	`('aer', 'qje', 'econometrica', 'restat')`	Book-tab styling preset forwarded to the renderers.	`'aer'`

Returns:

Type	Description
`Collection`	An empty, ordered container; add items via its `add_*` methods.

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> m1 = sp.regress("log_wage ~ education", data=df)
>>> m2 = sp.regress("log_wage ~ education + experience", data=df)
>>> c = sp.collect("Wage analysis")
>>> _ = c.add_regression(m1, m2, name="main", title="Table 1")
>>> _ = c.add_summary(df, vars=["log_wage", "education"], title="Table 2")
>>> len(c)
2
>>> c.list()["kind"].tolist()
['regtable', 'summary']
>>> md = c.to_markdown()  # or c.save("paper.docx")

cite ¶

cite(result: Any, term: Optional[str] = None, *, fmt: str = '%.3f', output: str = 'text', star_levels: Tuple[float, ...] = (0.1, 0.05, 0.01), second_row: str = 'se', alpha: float = 0.05, bold_estimate: bool = False) -> str

Format a single coefficient as an inline citation string.

Parameters:

Name	Type	Description	Default
`result`	`object`	Any StatsPAI result with either `params`/`std_errors` (econometric) or `estimate`/`se` (causal).	required
`term`	`str`	Coefficient name. Defaults to the headline `estimand` for causal results, or the first row of `params` for econometric results.	`None`
`fmt`	`str`	`printf`-style format string.	``"%.3f"``
`output`	`str`	One of `"text"`, `"latex"`, `"markdown"`/`"md"`, `"html"`.	``"text"``
`star_levels`	`tuple of float`	Star thresholds — same convention as :func:`sp.regtable`.	``(0.10, 0.05, 0.01)``
`second_row`	`str`	What to put in parentheses after the estimate. One of: `"se"` — standard error in `(...)`. `"t"` — t-statistic in `(...)`. `"p"` — p-value in `(...)`. `"ci"` — confidence interval in `[lo, hi]`. `"none"` — omit the second row entirely.	``"se"``
`alpha`	`float`	CI level when `second_row="ci"`.	`0.05`
`bold_estimate`	`bool`	For `output="text"` / `"latex"`, whether to bold the estimate (HTML / Markdown bold the estimate by default for readability).	`False`

Returns:

Type	Description
`str`	The formatted inline citation string.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> df = pd.DataFrame({
...     "x": rng.normal(size=200),
...     "treat": rng.integers(0, 2, size=200),
... })
>>> df["y"] = 1.0 + 0.5 * df["x"] + 0.8 * df["treat"] + rng.normal(size=200)
>>> m = sp.regress("y ~ x + treat", data=df)
>>> s = sp.cite(m, "treat")           # estimate*** (se)
>>> isinstance(s, str)
True
>>> sp.cite(m, "treat", output="latex")
'0.716^{***}~(0.143)'

list_journal_templates ¶

list_journal_templates() -> Tuple[str, ...]

Return the canonical names of every registered journal preset.

Examples:

>>> import statspai as sp
>>> names = sp.list_journal_templates()
>>> 'aer' in names and 'qje' in names
True

get_journal_template ¶

get_journal_template(name: str) -> Dict[str, Any]

Look up a journal preset by name (case-insensitive).

Parameters:

Name	Type	Description	Default
`name`	`str`	Template identifier (e.g. `"aer"` / `"AER"` / `"jf"`).	required

Returns:

Type	Description
`dict`	A copy of the preset entry. Callers are free to mutate it.

Raises:

Type	Description
`ValueError`	If name is not a registered template.

Examples:

>>> import statspai as sp
>>> tpl = sp.get_journal_template('AER')   # case-insensitive
>>> tpl['label']
'American Economic Review'
>>> tpl['se_label']
'Standard errors'

attach_provenance ¶

attach_provenance(result: Any, *, function: str, params: Optional[Mapping[str, Any]] = None, data: Optional[Any] = None, enabled: bool = True, overwrite: bool = False) -> Any

Attach a :class:Provenance record as result._provenance.

Parameters:

Name	Type	Description	Default
`result`	`object`	The estimator result. Must accept attribute assignment; `CausalResult` / `ResultBase` / dataclasses / SimpleNamespace all work. Tuples / dicts / immutable types do not — for those the call is a silent no-op.	required
`function`	`str`	Logical name of the producing call, e.g. `"statspai.did.callaway_santanna"`.	required
`params`	`mapping`	Call arguments. Will be summarised (frames hashed; long sequences truncated; non-serialisable values reduced to repr).	`None`
`data`	`DataFrame / Series / ndarray`	The estimator's input data. Used to compute a 12-char SHA-256 fingerprint.	`None`
`enabled`	`bool`	Set to False to skip provenance entirely (zero-overhead path).	`True`
`overwrite`	`bool`	If False (default) and `result._provenance` already exists, do nothing — preserves the first (most-specific) record set by an inner estimator.	`False`

Returns:

Name	Type	Description
`result`	`same object`	Returned for chaining: `return attach_provenance(res, ...)`.

Notes

Failures are swallowed. Provenance must never break the caller — if attribute assignment isn't possible, we no-op and move on.

Examples:

>>> import pandas as pd
>>> import statspai as sp
>>> from types import SimpleNamespace
>>> df = pd.DataFrame({"y": [1.0, 2.0, 3.0], "x": [0.1, 0.2, 0.3]})
>>> res = SimpleNamespace(estimate=1.23)
>>> _ = sp.attach_provenance(res, function="sp.did.callaway_santanna",
...                          params={"method": "dr"}, data=df)
>>> res._provenance.function
'sp.did.callaway_santanna'
>>> res._provenance.data_shape
[3, 2]

get_provenance ¶

get_provenance(result: Any) -> Optional[Provenance]

Return result._provenance if present, else None.

Walks one level of common containers (dict, list, tuple) — useful when an estimator returns a tuple (result, diagnostics).

Examples:

>>> import statspai as sp
>>> from types import SimpleNamespace
>>> res = SimpleNamespace(estimate=1.23)
>>> _ = sp.attach_provenance(res, function="sp.iv.ivreg")
>>> prov = sp.get_provenance(res)
>>> isinstance(prov, sp.Provenance)
True
>>> prov.function
'sp.iv.ivreg'
>>> sp.get_provenance(SimpleNamespace()) is None  # no record attached
True

compute_data_hash ¶

compute_data_hash(data: Any, length: int = 12) -> Optional[str]

Return a short SHA-256 fingerprint of data, or None.

Accepts: - pandas.DataFrame — order- and column-name-sensitive hash. - pandas.Series — hashed via hash_pandas_object. - numpy.ndarray — bytes-hashed (shape-included). - bytes / bytearray — direct SHA-256.

Anything else returns None rather than raising — provenance must never break the calling estimator.

Examples:

>>> import pandas as pd
>>> import statspai as sp
>>> df = pd.DataFrame({"y": [1.0, 2.0, 3.0], "x": [0.1, 0.2, 0.3]})
>>> h = sp.compute_data_hash(df)
>>> len(h)
12
>>> bool(sp.compute_data_hash(df) == h)  # deterministic
True

format_provenance ¶

format_provenance(prov: Provenance, *, indent: int = 2) -> str

Pretty multi-line rendering of a :class:Provenance record.

Examples:

>>> import statspai as sp
>>> prov = sp.Provenance(function="sp.rd.rdrobust",
...                      params={"kernel": "triangular"})
>>> text = sp.format_provenance(prov)
>>> text.splitlines()[0]
'Provenance'
>>> "sp.rd.rdrobust" in text
True

lineage_summary ¶

lineage_summary(*results: Any) -> Dict[str, Any]

Aggregate a lineage report across multiple results.

Useful for sp.replication_pack / Quarto appendix generation: pass every fitted result the paper depends on and get back a {run_id: provenance_dict} map plus a deduped list of input data hashes.

Examples:

>>> import statspai as sp
>>> from types import SimpleNamespace
>>> r1, r2 = SimpleNamespace(), SimpleNamespace()
>>> _ = sp.attach_provenance(r1, function="sp.did.callaway_santanna")
>>> _ = sp.attach_provenance(r2, function="sp.iv.ivreg")
>>> report = sp.lineage_summary(r1, r2)
>>> report["n_runs"]
2
>>> sorted(report)
['data_inputs', 'n_runs', 'python_version', 'runs', 'statspai_version']

replication_pack ¶

replication_pack(target: Any, output_path: Union[str, PathLike], *, data: Optional[Any] = None, code: Optional[str] = None, env: bool = True, bib: bool = True, paper_format: str = 'auto', title: str = 'Replication Pack', extra_files: Optional[Mapping[str, Union[str, bytes]]] = None, include_git_sha: bool = True, overwrite: bool = True) -> ReplicationPack

Build a replication archive.

Parameters:

Name	Type	Description	Default
`target`	`object`	Anything carrying analysis state. Best when it's a :class:`statspai.workflow.paper.PaperDraft` (we then auto-extract the rendered paper, the workflow's data, and any results with provenance) but plain estimator results, lists thereof, or even `None` (for "just pack data + script") all work.	required
`output_path`	`str or PathLike`	Destination `.zip` path. Created or overwritten.	required
`data`	`DataFrame / Series`	Explicit dataset. When omitted, we try `target.data` / `target.workflow.data`. If both fail, the archive omits the `data/` directory and warns in `MANIFEST.json`.	`None`
`code`	`str or PathLike`	Either an inline Python script (multi-line string) or a path to a .py file. When omitted, `code/` is also omitted and a warning is logged.	`None`
`env`	`bool`	Include `env/requirements.txt` from `pip freeze`. Disable to keep the archive small or to avoid the subprocess call.	`True`
`bib`	`bool`	Write `paper/paper.bib` from `target.citations` (or equivalent).	`True`
`paper_format`	`('auto', 'md', 'qmd', 'tex', 'docx')`	How to render the PaperDraft inside the archive. "auto" picks `draft.fmt` (and falls back to "md" for unknown formats).	`"auto"`
`title`	`str`	Used in `README.md`.	`'Replication Pack'`
`extra_files`	`mapping`	`{"path/in/zip.txt": "contents" or b"bytes"}` — anything custom you want stuffed into the archive.	`None`
`include_git_sha`	`bool`	Capture `git rev-parse HEAD` for `MANIFEST.json` (silently skipped when not in a repo).	`True`
`overwrite`	`bool`	Overwrite an existing archive at `output_path`.	`True`

Returns:

Type	Description
`ReplicationPack`	Summary object. `rp.output_path` is the on-disk archive; `rp.manifest` is the parsed `MANIFEST.json`; `rp.warnings` lists any partial-success notes.

Examples:

>>> import os, tempfile
>>> import statspai as sp
>>> df = sp.cps_wage()
>>> res = sp.regress("log_wage ~ education + experience", data=df)
>>> out = os.path.join(tempfile.mkdtemp(), "reppack")
>>> rp = sp.replication_pack(res, out, env=False, bib=False,
...                          include_git_sha=False)
>>> type(rp).__name__
'ReplicationPack'
>>> bool(os.path.exists(rp.output_path))
True

to_gt ¶

to_gt(result: Any, *, title: Optional[str] = None, subtitle: Optional[str] = None, notes: Optional[Sequence[str]] = None, template: Optional[str] = None, rowname_col: Optional[str] = None, apply_theme: bool = True) -> 'gt_pkg.GT | list[gt_pkg.GT]'

Convert StatsPAI table / DataFrame / Collection into GT.

Dispatches on the input type:

:class:statspai.output.RegtableResult — full-fidelity adapter that picks up the table's rendered cells, journal preset, title, and footer notes.
:class:statspai.output.PaperTables — flattens panels into a single GT with tab_row_group per panel.
:class:statspai.output.Collection — converts each convertible item and returns a list[GT]; callables include RegtableResult, MeanComparisonResult, and any object with to_dataframe().
:class:pandas.DataFrame — wraps verbatim. rowname_col promotes a column to row labels.
Anything else with a to_dataframe() method — calls it.

Parameters:

Name	Type	Description	Default
`result`	`object`	See dispatch description above.	required
`title`	`str`	Override the table title. Default: pulled from `result.title` when present.	`None`
`subtitle`	`str`	Subtitle line (`tab_header(subtitle=...)`).	`None`
`notes`	`sequence of str`	Footer notes. When omitted, `result.notes` is used.	`None`
`template`	`str`	Journal preset (`"aer"` / `"qje"` / …). Default: pulled from `result.template` when present.	`None`
`rowname_col`	`str`	Column to elevate to GT's row label position. For `RegtableResult` we default to the variable column automatically; for plain DataFrames, pass the column name.	`None`
`apply_theme`	`bool`	Apply the journal-preset gt theme. Set False to keep gt's defaults (useful when the caller wants to compose their own `tab_style` chain).	`True`

Returns:

Type	Description
`GT`	A fresh GT instance. Chain `.tab_style(...)`, `.opt_*`, `.tab_spanner(...)`, etc. as you would in pure `great_tables`.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> x = rng.normal(0, 1, n)
>>> data = pd.DataFrame({'y': 1.0 + 0.5 * x + rng.normal(0, 1, n), 'x': x})
>>> rt = sp.regtable(sp.regress('y ~ x', data=data), title='Demo')
>>> g = sp.gt(rt)
>>> type(g).__name__
'GT'
>>> bool('<table' in g.as_raw_html())
True
>>> g.as_latex()

Plain DataFrame path — promote a column to row labels:

>>> df = pd.DataFrame({'var': ['x', 'y'], 'M1': ['0.5***', '0.3']})
>>> g2 = sp.gt(
...     df, rowname_col='var', title='Custom table'
... )
>>> type(g2).__name__
'GT'

Raises:

Type	Description
`ImportError`	If `great_tables` is not installed. Install via `pip install great_tables`.
`TypeError`	If `result` is not adaptable to a DataFrame.

is_great_tables_available ¶

is_great_tables_available() -> bool

Return True iff great_tables can be imported in this env.

Use this to guard optional :func:statspai.to_gt calls without triggering an ImportError when the package is absent.

Examples:

>>> import statspai as sp
>>> isinstance(sp.is_great_tables_available(), bool)
True

csl_url ¶

csl_url(name: str) -> str

Return the canonical Zotero/styles URL for a CSL preset.

Use the URL once at project setup:

.. code-block:: bash

curl -O $(python -c "import statspai as sp; print(sp.csl_url('aer'))")
# → american-economic-association.csl in the current directory

Then point Quarto at the local copy via csl: paper-style.csl in the YAML header. PaperDraft.to_qmd(csl='aer') does the local filename pass-through automatically.

Raises:

Type	Description
`ValueError`	Unknown short name. Use :func:`list_csl_styles` to enumerate.

Examples:

>>> import statspai as sp
>>> url = sp.csl_url("aer")
>>> url.endswith("american-economic-association.csl")
True

csl_filename ¶

csl_filename(name: str) -> str

Return the canonical .csl filename (no path) for a preset.

Useful when emitting a csl: ... line into a Quarto YAML header where the user has already downloaded the style file alongside paper.qmd.

Examples:

>>> import statspai as sp
>>> sp.csl_filename("qje")
'the-quarterly-journal-of-economics.csl'

list_csl_styles ¶

list_csl_styles() -> List[Tuple[str, str]]

List (short_name, full_label) pairs for every registered style.

Examples:

>>> import statspai as sp
>>> styles = sp.list_csl_styles()
>>> ("aer", "American Economic Review") in styles
True

parse_citation_to_bib ¶

parse_citation_to_bib(citation: str, key: Optional[str] = None) -> Dict[str, Any]

Parse a citation string into a BibTeX-shaped dict.

Returns a dict with at least key, type (article / misc), and as many of (author, year, title, journal) as the regex can extract.

For full-fidelity bibliographies, write your paper.bib by hand or via Zotero — this is a "quick-start" parser, deliberately conservative.

Examples:

>>> import statspai as sp
>>> entry = sp.parse_citation_to_bib(
...     "Abadie A (2003). Semiparametric instrumental variable "
...     "estimation. Journal of Econometrics."
... )
>>> entry["type"]
'article'
>>> entry["key"]
'abadie2003semiparametric'
>>> sorted(entry["fields"])
['author', 'journal', 'title', 'year']

make_bib_key ¶

make_bib_key(citation: str) -> str

Compute a stable BibTeX key from a free-form citation string.

Format: firstauthor + year + first-title-word, e.g. "callaway2021difference". Falls back to a hash-derived key when we can't parse author+year.

Examples:

>>> import statspai as sp
>>> sp.make_bib_key(
...     "Abadie A (2003). Semiparametric instrumental variable "
...     "estimation. Journal of Econometrics."
... )
'abadie2003semiparametric'

citations_to_bib_entries ¶

citations_to_bib_entries(citations: Iterable[str]) -> List[Dict[str, Any]]

Parse a sequence of citation strings into BibTeX-entry dicts.

Deduplicates by key — the first occurrence wins (matches the replication_pack semantics where inner estimators register citations before outer wrappers).

Examples:

>>> import statspai as sp
>>> entries = sp.citations_to_bib_entries([
...     "Abadie A (2003). Semiparametric IV estimation. Journal of Econometrics.",
...     "Abadie A (2003). Semiparametric IV estimation. Journal of Econometrics.",
... ])
>>> len(entries)  # deduplicated by computed key
1

write_bib ¶

write_bib(citations: Iterable[Union[str, Dict[str, Any]]], path: Union[str, Path], *, append: bool = False, header: bool = True) -> Path

Write a clean BibTeX file from citation strings or entry dicts.

Parameters:

Name	Type	Description	Default
`citations`	`iterable`	Either free-form citation strings (parsed via :func:`parse_citation_to_bib`) or pre-built entry dicts `{"key": ..., "type": ..., "fields": {...}}`.	required
`path`	`str or Path`	Destination `.bib` file. Parent dirs are created.	required
`append`	`bool`	Append to an existing file rather than overwriting.	`False`
`header`	`bool`	Prepend a one-line `% Auto-generated by StatsPAI ...` comment at the top of fresh files (skipped on append).	`True`

Returns:

Type	Description
`Path`	Resolved path of the written file.

Notes

Deduplicates by computed bib key. Pre-built entry dicts are taken as-is; only string citations go through the regex parser.

Examples:

>>> import statspai as sp
>>> import tempfile, os
>>> with tempfile.TemporaryDirectory() as d:
...     out = sp.write_bib(
...         ["Abadie A (2003). Semiparametric IV estimation. "
...          "Journal of Econometrics."],
...         os.path.join(d, "paper.bib"),
...     )
...     "@article" in out.read_text(encoding="utf-8")
True

statspai.output¶

output ¶

RegtableResult ¶

summary ¶

to_latex ¶

to_markdown ¶

to_quarto ¶

to_dataframe ¶

to_dict ¶

to_json ¶

to_agent_summary ¶

from_dict classmethod ¶

to_excel ¶

to_word ¶

to_docx ¶

save ¶

EstimateTableResult ¶

MeanComparisonResult ¶

to_excel ¶

to_word ¶

save ¶

PaperTables dataclass ¶

panels ¶

to_latex ¶

to_markdown ¶

to_text ¶

to_dict ¶

to_json ¶

to_docx ¶

to_xlsx ¶

Collection ¶

list ¶

to_frame ¶

to_csv ¶

to_dict ¶

to_json ¶

add_regression ¶

add_table ¶

add_summary ¶

add_balance ¶

add_text ¶

add_heading ¶

to_text ¶

to_markdown ¶

to_html ¶

to_latex ¶

to_docx ¶

to_xlsx ¶

save ¶

CollectionItem dataclass ¶

Provenance dataclass ¶

to_dict ¶

short ¶

ReplicationPack ¶

regtable ¶

eststo ¶

estclear ¶

esttab ¶

coefplot ¶

coefplot_tikz ¶

balance_table ¶

collect ¶

cite ¶

list_journal_templates ¶

get_journal_template ¶

attach_provenance ¶

get_provenance ¶

compute_data_hash ¶

format_provenance ¶

lineage_summary ¶

replication_pack ¶

to_gt ¶

is_great_tables_available ¶

csl_url ¶

csl_filename ¶

list_csl_styles ¶

parse_citation_to_bib ¶

make_bib_key ¶

citations_to_bib_entries ¶

write_bib ¶

`statspai.output`¶

from_dict `classmethod` ¶

PaperTables `dataclass` ¶

CollectionItem `dataclass` ¶

Provenance `dataclass` ¶