Skip to content

statspai.output

output

Output utilities for regression and causal-inference results.

The package is organised by purpose:

Regression-table renderers (4 entry points historically — see PR-B design doc; regtable is the canonical one):

  • :func:regtable — canonical multi-model regression table renderer. Supports text / HTML / LaTeX / Markdown / Quarto / Excel / Word / DataFrame, journal templates, multi-row SE, repro provenance.
  • :func:esttab — Stata estout / esttab clone (use eststo to register, then esttab to print). Thin Stata-flavoured surface.
  • :func:modelsummary — R modelsummary clone (functional API).
  • :func:outreg2 / :class:OutReg2 — Stata outreg2 clone (Excel-first surface).

Single-table helpers:

  • :func:tab — Stata-style tabulate.
  • :func:sumstats — descriptive summary statistics.
  • :func:balance_table — covariate-balance table.
  • :func:mean_comparison — two-group mean comparison with t-test / ranksum / chi2 (lives in mean_comparison.py since v1.6.x — re-exported from regression_table for back-compat).

Multi-table / paper bundles:

  • :func:paper_tables — Main / Heterogeneity / Robustness panels.
  • :class:Collection / :func:collect — narrative document builder.

Plotting:

  • :func:coefplot — coefficient plot.

Provenance / replication / citations:

  • :class:Provenance, :func:attach_provenance, :func:get_provenance, :func:compute_data_hash, :func:format_provenance, :func:lineage_summary.
  • :class:ReplicationPack, :func:replication_pack.
  • :func:cite, :data:CSL_REGISTRY, :func:csl_url, ...

Adapters:

  • :func:to_gt, :func:is_great_tables_availablegreat_tables adapter (lazy).

RegtableResult

Rich result object for regression tables with multi-format export.

to_latex

to_latex(*, siunitx: bool = False, threeparttable: bool = False, siunitx_preamble: bool = False) -> str

Render the table as a LaTeX table float.

Parameters:

Name Type Description Default
siunitx bool

Decimal-align the numeric columns with siunitx S columns (journal style): coefficients align on the decimal point and significance stars ride along as \textsuperscript. Requires \usepackage{siunitx} (v3). Not supported together with multi_se / eform / apply_coef / cell templates / column_spanners (raises NotImplementedError).

False
threeparttable bool

Wrap the table in threeparttable and move the footnotes into a tablenotes block (requires \usepackage{threeparttable}).

False
siunitx_preamble bool

Prepend a comment line listing the required \usepackage lines.

False

to_markdown

to_markdown(*, quarto: bool = False) -> str

Render the table as Markdown.

Parameters:

Name Type Description Default
quarto bool

When True, append a Quarto cross-reference caption block of the form : <caption> {#tbl-<label>} so the table can be referenced via @tbl-<label> in the manuscript. Requires quarto_label to have been set on the result (typically via regtable(..., quarto_label="main")). Equivalent to calling :meth:to_quarto.

False

to_quarto

to_quarto() -> str

Render as a Quarto-cross-referenceable Markdown table.

Builds on :meth:to_markdown and appends a Quarto caption block of the form::

: <caption> {#tbl-<label>}

which lets the manuscript reference the table via @tbl-<label>. The tbl- prefix is auto-prepended when the user passes a bare quarto_label="main".

Behaviour
  • quarto_label is required. Without it, ValueError is raised — Quarto cross-references need an id.
  • quarto_caption falls back to title when not provided. If neither is set, a generic "Regression results" is used and a warning is emitted.
  • The leading title line is dropped (the caption block replaces it) to avoid duplicating the heading.

to_dataframe

to_dataframe() -> DataFrame

Return the table as a pandas DataFrame.

to_dict

to_dict(*, renders: Union[bool, Sequence[str], None] = None) -> Dict[str, Any]

Return a JSON-safe dict representation of the table.

The package is agent-native (CLAUDE.md §1): a rendered regression table is a first-class artifact an LLM tool loop should be able to serialise, cache, and reason over without re-rendering. The payload carries three layers:

  • metadatamodel_labels, dep_var_labels, panel_labels, title, notes, template, se_type, stars / star_levels, requested_stats, coef_labels.
  • table — the rendered cell grid (the same strings :meth:to_dataframe produces: "2.067***", "(0.074)" …), as a list of {"term": ..., <model label>: <cell>} records.
  • models — the numeric truth per model: coefficient estimates, standard errors, t / p values, confidence bounds, summary stats and the dependent variable. Use this layer for machine reasoning; use table for faithful re-display.

Parameters:

Name Type Description Default
renders bool | sequence of str

When truthy, also embed fully rendered strings under "renders". True embeds latex / html / markdown / text; a sequence selects specific formats (e.g. renders=["latex"]). Default None keeps the payload compact.

None

Returns:

Type Description
dict

JSON-safe; round-trips through json.dumps.

Examples:

>>> import statspai as sp
>>> tbl = sp.regtable(m1, m2, template="aer")
>>> payload = tbl.to_dict()
>>> payload["models"][0]["coefficients"]["x"]["estimate"]
2.067

to_json

to_json(*, indent: Optional[int] = None, renders: Union[bool, Sequence[str], None] = None) -> str

Serialise :meth:to_dict via json.dumps.

from_dict classmethod

from_dict(payload: Dict[str, Any]) -> 'RegtableResult'

Reconstruct a :class:RegtableResult from a :meth:to_dict payload.

The inverse of :meth:to_dict. Rebuilds one normalised model per entry in the models layer (coefficient estimates, SE, t, p, CI, summary stats, dependent variable), re-splits them into panels per render_spec.panel_sizes, and restores the render-controlling metadata (labels, fmt, se_type, stars / star levels, stats, keep / drop / order, add_rows, alpha, template). For a table built without exotic options this round-trips exactly:

import statspai as sp t = sp.regtable(m1, m2, template="aer") # doctest: +SKIP RegtableResult.from_dict(t.to_dict()).to_latex() == t.to_latex() True

Notes

Exotic features that are not part of the to_dict payload — stacked multi_se rows, eform transforms, column_spanners, tests rows, and custom apply_coef transforms — are NOT preserved across the round-trip (they reconstruct as a plain table). The serialised table / renders layers already capture their rendered form if you only need to re-display, not re-compute.

to_excel

to_excel(filename: str) -> None

Export table to Excel as a strict book-tab three-line table.

Uses the shared _excel_style primitives so the visual output is byte-aligned with sumstats, tab, paper_tables, collection, modelsummary and outreg2: thick top rule above the column header, thin mid rule between header and body, thick bottom rule below the last data row, Times New Roman throughout.

to_word

to_word(filename: str) -> None

Export table to Word (.docx) file in AER/QJE book-tab style.

The exported document follows economics-journal conventions: a heavy top rule, thin mid rule below the header, heavy bottom rule above notes, and no internal vertical borders. Body text is Times New Roman 10pt; the notes paragraph is 8pt italic.

to_docx

to_docx(filename: str) -> None

Alias for :meth:to_word — mirrors Stata outreg2 convention.

save

save(filename: str) -> None

Auto-detect format from file extension and save.

EstimateTableResult

Stata esttab result handle — thin wrapper over a :class:RegtableResult.

Preserves the EstimateTableResult type identity for callers that do isinstance(x, EstimateTableResult). Forwards every render method to the underlying regtable result; adds to_csv() for parity with the legacy esttab API (regtable does not natively expose CSV but the dataframe path is byte-identical to what the legacy esttab produced).

MeanComparisonResult

Rich result object for balance / mean comparison tables.

to_excel

to_excel(filename: str) -> None

Export balance table to Excel as a book-tab three-line table.

to_word

to_word(filename: str) -> None

Export balance table to Word in AER/QJE book-tab style.

save

save(filename: str) -> None

Auto-detect format from extension and save.

PaperTables dataclass

Container for the multi-panel paper-table bundle.

Each attribute is a RegtableResult (has .text, .latex, .html, .to_latex(), ...). Iterate via .panels() or access by name: pt.main / pt.heterogeneity / pt.robustness / pt.placebo.

panels

panels() -> Dict[str, RegtableResult]

Return {name: RegtableResult} for every populated panel.

to_latex

to_latex(path: Optional[str] = None) -> str

Concatenate every panel's LaTeX into a single file.

If path is provided, also write to disk.

to_markdown

to_markdown(path: Optional[str] = None) -> str

Return all panels stacked as GitHub-flavoured Markdown.

to_text

to_text() -> str

Return all panels stacked as plain text (terminal-friendly).

to_dict

to_dict() -> Dict[str, Any]

Return a JSON-safe dict representation of the multi-panel bundle.

Agent-native counterpart to the renderers: each populated panel (main / heterogeneity / robustness / placebo) carries the full :meth:RegtableResult.to_dict payload (metadata + rendered grid + numeric truth), so an LLM tool loop can cache and reason over the whole bundle without re-rendering.

to_json

to_json(*, indent: Optional[int] = None) -> str

Serialise :meth:to_dict via json.dumps.

to_docx

to_docx(path: str) -> str

Write every populated panel into a single .docx file.

Each panel renders as an AER/QJE book-tab table (heavy top rule, thin mid rule, heavy bottom rule, no internal borders). Panels are separated by a page break so the file lands on the co-author's desk ready for direct insertion into a manuscript.

Returns the file path that was written.

to_xlsx

to_xlsx(path: str) -> str

Write every populated panel as a separate sheet in one workbook.

Sheet names are the panel names (main / heterogeneity / robustness / placebo). Each sheet uses the AER book-tab style: heavy rules above the header, below the header, and below the last data row.

Returns the file path that was written.

Collection

A named, ordered bundle of tables and prose for a single document.

Add items in any order via the add_* methods (each returns self so calls can be chained). Render to any supported format via save(path) or one of the explicit to_* methods.

Parameters:

Name Type Description Default
title str

Displayed at the top of the rendered document.

None
template ('aer', 'qje', 'econometrica', 'restat')

Forwarded to paper_tables style; also drives the default star levels used by add_regression.

'aer'

list

list() -> DataFrame

Return a DataFrame summary (name / kind / title) for inspection.

to_frame

to_frame(*, include_text: bool = False) -> DataFrame

Return a semantic long-format view of the collection.

This is the programmatic counterpart to Stata's collect layout and R's modelsummary/gt data pipeline: every rendered cell is represented as one row with stable dimensions item / kind / term / statistic / model and both a raw numeric value (when parseable) and the display formatted string.

Parameters:

Name Type Description Default
include_text bool

Include free-form text and headings as rows. Table workflows usually leave this off; document-audit workflows may want it.

False

Returns:

Type Description
DataFrame

Long-format cell table with provenance-friendly dimensions.

to_csv

to_csv(path: Optional[str] = None, **kwargs) -> str

Render :meth:to_frame to CSV; optionally write to path.

to_dict

to_dict() -> Dict[str, Any]

Return a JSON-safe dict representation of the whole document.

Agent-native counterpart to :meth:save — every item (regression table, balance / summary table, free text) is serialised in order so an LLM tool loop can cache and reason over a multi-table document without re-rendering. Regression-table items carry the full :meth:RegtableResult.to_dict payload (metadata + rendered grid + numeric truth).

to_json

to_json(*, indent: Optional[int] = None) -> str

Serialise :meth:to_dict via json.dumps.

add_regression

add_regression(*results, name: Optional[str] = None, title: Optional[str] = None, **regtable_kwargs) -> 'Collection'

Add a regression table built from one or more model results.

regtable_kwargs are forwarded verbatim to sp.regtable.

add_table

add_table(result, *, name: Optional[str] = None, title: Optional[str] = None) -> 'Collection'

Add an already-built RegtableResult / MeanComparisonResult.

add_summary

add_summary(data: DataFrame, vars: Optional[Sequence[str]] = None, *, stats: Optional[Sequence[str]] = None, name: Optional[str] = None, title: Optional[str] = None, labels: Optional[Dict[str, str]] = None) -> 'Collection'

Add a descriptive-statistics table built from a DataFrame.

Stores the underlying DataFrame; rendering re-uses the existing sumstats formatters so the AER book-tab style applies.

add_balance

add_balance(data: DataFrame, treatment: str, variables: Sequence[str], *, weights: Optional[str] = None, test: str = 'ttest', name: Optional[str] = None, title: Optional[str] = None, fmt: str = '%.3f') -> 'Collection'

Add a treatment vs. control balance table (calls mean_comparison).

add_text

add_text(text: str, *, name: Optional[str] = None, title: Optional[str] = None) -> 'Collection'

Add a free-form text block (rendered as a paragraph).

add_heading

add_heading(text: str, *, level: int = 2, name: Optional[str] = None) -> 'Collection'

Add a section heading (level 1-3).

to_text

to_text() -> str

Plain-text rendering of every item, top to bottom.

to_markdown

to_markdown(path: Optional[str] = None) -> str

Render to GitHub-flavoured Markdown; optionally write to path.

to_html

to_html(path: Optional[str] = None) -> str

Render to a single self-contained HTML document.

to_latex

to_latex(path: Optional[str] = None) -> str

Concatenate every item's LaTeX into one .tex file.

to_docx

to_docx(path: str) -> str

Write the entire collection to a single .docx file.

Each item renders in turn — headings as Word headings, text as paragraphs, tables in AER book-tab style — separated by page breaks between tables.

to_xlsx

to_xlsx(path: str) -> str

Write the collection to a single workbook (one sheet per item).

save

save(path: str) -> str

Auto-detect format from path extension and write.

CollectionItem dataclass

One entry in a :class:Collection.

Provenance dataclass

A traceable record of how a single estimate was produced.

Attributes:

Name Type Description
function str

Fully qualified function name (e.g. "statspai.did.callaway_santanna").

params dict

JSON-serialisable summary of the call arguments.

data_hash str or None

12-char SHA-256 prefix of the input data, or None when the data was too large to hash (>1M rows) or wasn't a recognised type.

data_shape list[int] or None

[n_rows, n_cols] of the input frame, when known.

run_id str

Per-call uuid4 — disambiguates two structurally identical calls in the same session.

statspai_version str

Package version at the time of the call.

python_version str

"3.11.5"-style.

timestamp str

ISO-8601 wall-clock of when the call returned.

to_dict

to_dict() -> Dict[str, Any]

Plain-dict view, suitable for JSON dumping.

short

short() -> str

One-line human summary.

ReplicationPack

Lightweight summary returned by :func:replication_pack.

Mostly for testing / programmatic inspection; users typically just care about output_path.

regtable

regtable(*args, panel_labels: Optional[List[str]] = None, coef_labels: Optional[Dict[str, str]] = None, dep_var_labels: Optional[List[str]] = None, model_labels: Optional[List[str]] = None, keep: Optional[Sequence[str]] = None, drop: Optional[Sequence[str]] = None, order: Optional[Sequence[str]] = None, stats: Optional[Sequence[str]] = None, se_type: str = 'se', stars: bool = True, star_levels: Optional[Tuple[float, ...]] = None, fmt: str = '%.3f', output: str = 'text', filename: Optional[str] = None, title: Optional[str] = None, notes: Optional[List[str]] = None, add_rows: Optional[Dict[str, List[str]]] = None, alpha: float = 0.05, template: Optional[str] = None, diagnostics: Union[str, bool] = 'auto', multi_se: Optional[Dict[str, Sequence[Any]]] = None, repro: Union[bool, Dict[str, Any], None] = None, quarto_label: Optional[str] = None, quarto_caption: Optional[str] = None, eform: Union[bool, Sequence[bool]] = False, column_spanners: Optional[Sequence[Tuple[str, int]]] = None, coef_map: Optional[Dict[str, str]] = None, consistency_check: bool = True, estimate: Optional[str] = None, statistic: Optional[str] = None, notation: Union[str, Tuple[str, ...]] = 'stars', apply_coef: Optional[Any] = None, apply_coef_deriv: Optional[Any] = None, escape: bool = True, tests: Optional[Dict[str, Sequence[Any]]] = None, fixef_sizes: bool = False, vcov: Optional[str] = None, transpose: bool = False) -> RegtableResult

Unified publication-quality regression table.

Accepts model results as positional arguments. If the first argument is a list, each list is treated as a separate panel.

Parameters:

Name Type Description Default
*args model results or lists of model results

EconometricResults, CausalResult, or any duck-typed object with params / std_errors attributes. Pass multiple lists to create a multi-panel table.

()
panel_labels list of str

Labels for each panel (e.g., ["Panel A: Wages", "Panel B: Hours"]).

None
coef_labels dict

Rename variables: {"education": "Years of Education"}.

None
dep_var_labels list of str

Dependent variable labels shown below column headers.

None
model_labels list of str

Column header labels. Defaults to (1), (2), ....

None
keep list of str

Only show these variables.

None
drop list of str

Hide these variables.

None
order list of str

Reorder variables.

None
stats list of str

Summary statistics. Defaults to ["N", "R2", "adj_R2", "F"].

None
se_type str

What to show beneath coefficients: "se", "t", "p", or "ci" for confidence intervals.

``"se"``
stars bool

Append significance stars.

True
star_levels tuple

Thresholds for *, **, ***.

``(0.10, 0.05, 0.01)``
fmt str

Format string for numeric values. Pass any C-style format ("%.0f", "%.4f", ...) for fixed precision, or "auto" for magnitude-adaptive precision (recommended when a single table mixes dollar-magnitude coefficients like 1521 with elasticity-magnitude coefficients like 0.288 — fixed "%.0f" would round the latter to 0).

``"%.3f"``
output str

Controls what str(result) / repr(result) / print(result) returns — one of "text", "latex", "html", "markdown", "quarto", "word", "excel". In Jupyter, _repr_html_ always renders HTML regardless of this setting.

``"text"``
filename str

Save the table to this file path. The format is chosen from the file extension (.tex/.html/.md/.qmd/.docx/ .xlsx/.csv/.json), independently of output=. The .json form writes the agent-native :meth:RegtableResult.to_dict payload. Pass a matching extension and output= to avoid surprises.

None
quarto_label str

Quarto cross-reference id. Pass "main" to make the table referenceable as @tbl-main from the manuscript prose. The tbl- prefix is auto-prepended when missing. Required for to_quarto() and output="quarto".

None
quarto_caption str

Caption rendered alongside the Quarto cross-ref id. Falls back to title when omitted; if both are absent, a generic "Regression results" is used and a warning is emitted.

None
title str

Table title / caption.

None
notes list of str

Additional notes beneath the table.

None
add_rows dict

Custom rows: {"Controls": ["No", "Yes", "Yes"]}. User-provided rows take precedence over auto-extracted diagnostic rows with the same label.

None
alpha float

Significance level used when se_type='ci'. Displayed CI is (1 - alpha) * 100%. With alpha=0.05 (default) the bounds come from the model's stored 95% CI; for any other alpha the bounds are recomputed as b ± crit · se, using the t-distribution when df_resid is known, else the standard normal.

0.05
template str

Journal preset name. One of "aer", "qje", "econometrica", "restat", "jf", "aeja", "jpe", "restud". When set, fills in defaults for star_levels, the SE-row footer label (e.g. QJE → "Robust standard errors"), the default stats selection (e.g. JF/AEJA include Adj. R²), and any extra notes — but every explicit kwarg you pass still wins. See :data:statspai.output._journals.JOURNALS.

None
diagnostics ('auto', 'off')

Auto-extract publication-quality diagnostic rows from the result objects:

  • FE / Cluster indicators — one row per distinct fixed effect variable (AER style: "Firm FE: Yes/No", "Year FE: Yes/No"; interactions render as "Firm × Year FE"), plus "Cluster SE: <var>". Falls back to a single "Fixed Effects: Yes/No" row when FE metadata is present but unparseable.
  • IV — first-stage F (Olea-Pflueger / KP), Hansen-J p.
  • DiD — pre-trend p-value, treated-group count.
  • RD — bandwidth, kernel, polynomial order.

"auto" (and True) emit only rows where at least one column produces a non-empty cell; False / "off" disables all auto-extraction. User-supplied add_rows always override.

'auto'
multi_se dict

Stack additional SE specifications under the primary SE row. Keys are display labels (e.g. "Cluster SE", "Bootstrap SE") and values are sequences of :class:pandas.Series or dicts (one per model column) mapping coefficient names to SE values. Bracket styles cycle []/{}/⟨⟩/||. Footer notes record each label automatically.

None
repro bool or dict

Append a reproducibility metadata note (StatsPAI version, optional seed and data hash, timestamp) as the last footer line. True emits the version + timestamp only. Pass a dict to record more: {"data": df, "seed": 42, "extra": "git@<sha>"}.

None
eform bool or list of bool

Report exponentiated coefficients — odds ratios for logit / probit, incidence-rate ratios for poisson, hazard ratios for Cox-style models. Standard errors use the delta method (exp(b)·SE(b)), CI bounds are (exp(lo), exp(hi)) of the original endpoints, and t / p values are unchanged because H_0: b=0 is equivalent to H_0: exp(b)=1. Pass a per-model list (length matches n_models) to mix transformed and untransformed columns (e.g. logit + OLS in the same table). A footer note transparently flags which columns are exponentiated.

``False``
column_spanners list of (label, span)

Multi-row header above the model labels — each tuple groups span consecutive columns under label. Spans must partition all model columns (sum equals n_models). Renders as \multicolumn{n}{c}{label} + \cmidrule in LaTeX, colspan="n" in HTML, repeated bold cells in Markdown, and centered ASCII in text. Word and Excel exports inherit to_dataframe()'s flat column model and currently omit the spanner row — use the LaTeX or HTML output for paper-grade spanners. Mirrors Stata mgroups() and R modelsummary's group argument. Example: column_spanners=[("OLS", 2), ("IV", 2)] over four models.

None
coef_map dict

Single-shot rename + reorder + drop. Mirrors R modelsummary's coef_map: pass an ordered dict whose keys are coefficient names to keep (in display order) and values are the rendered labels. Variables not in coef_map are dropped. Mutually exclusive with coef_labels / keep / drop / order — pass either the unified map or the legacy four-parameter spec.

None
consistency_check bool

When two or more columns are passed and their sample sizes differ, emit a UserWarning. Reviewer red flag — disable by setting False (or annotate with notes=[...]) when the N-mismatch is intentional (IV first stage on a subsample, RD bandwidth restriction, etc.).

True
estimate str

Custom format string for the top (coefficient) line in each cell. Mirrors R modelsummary's estimate= argument. Placeholders: {estimate}, {stars}, {std_error}, {t_value}, {p_value}, {conf_low}, {conf_high}. Default "{estimate}{stars}". Pass e.g. "{stars}{estimate}" for stars-first, or "{estimate} ({std_error}){stars}" for an inline single-line cell.

None
statistic str

Custom format string for the bottom (statistic) line in each cell. Same placeholders as estimate. Default depends on se_type: "({std_error})" for se, "[{conf_low}, {conf_high}]" for ci, etc. Pass e.g. "t={t_value}, p={p_value}" for working-paper-style cells.

None
notation ``"stars"`` | ``"symbols"`` | tuple of 3 strings

Family of significance markers used when stars=True. "stars" (default) → ("*", "**", "***"); "symbols"("†", "‡", "§") (AER / JPE alternative when stars conflict with footnote markers); a 3-tuple of custom strings is accepted, ordered low → high.

'stars'
apply_coef callable

Apply an arbitrary transformation f(b) to each rendered coefficient. Generalises eform (which is shorthand for apply_coef=np.exp). Useful for percentage transforms (apply_coef=lambda b: 100*b), log scales, or signed sqrt for distortion measures. Pair with apply_coef_deriv for delta-method SE rescaling. Mutually exclusive with eform.

None
apply_coef_deriv callable

Derivative f'(b) of the apply_coef callable. When provided, SEs are rescaled as |f'(b)| · SE(b). When omitted, SEs stay on the original scale and a footer warns the reader.

None
escape bool

Auto-escape user-supplied label strings (coef_labels, model_labels, panel_labels, dep_var_labels, column_spanners labels, add_rows labels and values, title) for the active output format (LaTeX / HTML). Pass escape=False when those strings already contain raw markup you want to preserve verbatim — e.g. math-mode coefficient names like "$\beta_1$", or HTML tags like "<i>β</i>". Cell content (numeric estimates, computed stats) is unaffected; it never contains user-controlled metacharacters.

True
tests dict

Render hypothesis-test rows in the diagnostic strip below the stats block. Keys are display labels ("F-test x1=0", "Hansen J p-value", "Wald χ²"); values are sequences whose length equals n_models. Each per-model entry can be:

  • (statistic, pvalue) tuple → "<stat>***" (stars from p)
  • bare pvalue float → "<p>***"
  • None / NaN → empty cell
  • any string → passed through as-is

Stars honour the configured notation family for cross-table consistency. Closes the gap to Stata estadd scalar / test integration where reviewers expect Wald / Sargan / Hansen-J / first-stage F right under the main results block.

None
fixef_sizes bool

Auto-emit "# Firm: 1,234" / "# Year: 30" rows showing the number of distinct levels per fixed effect. Reads model_info['n_fe_levels'] from each result — currently populated by count.py (Poisson/NegBin) and the pyfixest adapter; other estimators silently no-op. Mirrors R fixest's etable(..., fixef_sizes=TRUE).

False
vcov str

Recompute the SE / t / p / 95% CI columns at print time using a different variance estimator. Currently supports OLS-style results that store data_info['X'] and data_info['residuals']:

  • "HC0" — White heteroskedasticity-robust
  • "HC1" / "robust" — Stata's robust (HC0 × n/(n-k))
  • "HC2" — leverage-weighted
  • "HC3" — leverage-squared (recommended for small samples; Long-Ervin)

Columns whose underlying result lacks the X/residuals fields emit a UserWarning and retain their fit-time SEs, so a heterogeneous mix of OLS + non-OLS does not blow up — the warning lists the affected columns so the user can audit.

None
transpose bool

Render with axes swapped: rows become models, columns become variables. Single-panel only; multi-panel input or multi_se= is rejected with NotImplementedError to keep the layout pivot semantics tight. Currently supports text and HTML renderers.

False

Returns:

Type Description
RegtableResult

Object with .to_text(), .to_latex(), .to_html(), .to_markdown(), .to_excel(filename), .to_word(filename), .to_dataframe(), .to_dict() / .to_json() (agent-native), .save(filename) methods. Renders as rich HTML in Jupyter notebooks via _repr_html_().

Examples:

>>> import statspai as sp
>>> m1 = sp.regress("y ~ x1", data=df)
>>> m2 = sp.regress("y ~ x1 + x2", data=df)
>>> sp.regtable(m1, m2)
>>> sp.regtable(m1, m2, output="latex", filename="table1.tex")
>>> sp.regtable([m1, m2], [m3, m4],
...     panel_labels=["Panel A: OLS", "Panel B: IV"])
>>>
>>> # Logit odds ratios
>>> sp.regtable(sp.logit("y ~ x", data=df), eform=True)
>>>
>>> # IV three-block table with column spanners
>>> sp.regtable(
...     ols1, ols2, iv1, iv2,
...     column_spanners=[("OLS", 2), ("IV", 2)],
...     stats=["N", "R2", "depvar_mean", "depvar_sd"],
... )
>>>
>>> # Unified coef_map (rename + order + drop in one shot)
>>> sp.regtable(m1, m2, coef_map={
...     "x2": "Education",
...     "x1": "Experience",
...     "Intercept": "Constant",
... })

eststo

eststo(result, *, name: Optional[str] = None) -> None

Store a model result (like Stata's estimates store).

estclear

estclear() -> None

Clear all stored model results.

esttab

esttab(*results, names: Optional[Sequence[str]] = None, se: bool = True, t: bool = False, p: bool = False, ci: bool = False, stars: bool = True, star_levels: Tuple[float, ...] = (0.1, 0.05, 0.01), keep: Optional[Sequence[str]] = None, drop: Optional[Sequence[str]] = None, order: Optional[Sequence[str]] = None, labels: Optional[Dict[str, str]] = None, stats: Optional[Sequence[str]] = None, fmt: str = '%.4f', output: str = 'text', filename: Optional[str] = None, title: Optional[str] = None, notes: Optional[Sequence[str]] = None, alpha: float = 0.05) -> EstimateTableResult

Stata-style esttab — thin facade over :func:sp.regtable.

.. deprecated:: Now a thin wrapper over :func:statspai.output.regtable. Use sp.regtable(*models, ...) directly for full control. See the module docstring for the parameter mapping.

Accepts model results directly as positional arguments or reads from the global store populated by :func:eststo. If both positional arguments and a non-empty store exist, positional arguments take precedence.

Parameters mirror the original esttab API; see the module docstring for the exact mapping to regtable.

coefplot

coefplot(*models, model_names: Optional[List[str]] = None, variables: Optional[List[str]] = None, ax=None, figsize: tuple = (8, 6), colors: Optional[List[str]] = None, title: Optional[str] = None, alpha: float = 0.05)

Forest plot comparing coefficients across models.

Parameters:

Name Type Description Default
*models

Model result objects.

()
model_names list of str
None
variables list of str

Which variables to plot. Default: all shared variables.

None
ax matplotlib Axes
None
figsize tuple
(8, 6)
colors list of str
None
title str
None
alpha float

Significance level for CIs.

0.05

Returns:

Type Description
(fig, ax)

coefplot_tikz

coefplot_tikz(*models, model_names: Optional[List[str]] = None, variables: Optional[List[str]] = None, coef_labels: Optional[Dict[str, str]] = None, level: float = 0.95, title: Optional[str] = None, xlabel: str = 'Coefficient estimate', standalone: bool = False) -> str

Return pgfplots / TikZ source for a coefficient forest plot.

The vector-graphics, LaTeX-native counterpart to :func:coefplot (which returns a Matplotlib (fig, ax) you can fig.savefig("plot.pdf") / .png). Each model becomes one \addplot series of point estimates with horizontal confidence-interval error bars; variables run down the y-axis with a dashed reference line at zero — the same forest layout as :func:coefplot, emitted as editable LaTeX.

Parameters:

Name Type Description Default
*models

Model result objects (EconometricResults / CausalResult / any object exposing params / std_errors).

()
model_names list of str

Legend labels. Default Model 1, Model 2, …

None
variables list of str

Which coefficients to plot. Default: every shared variable, sorted.

None
coef_labels dict

Rename variables on the y-axis, e.g. {"x": "Treatment"}.

None
level float

Confidence level for the error bars (normal approximation b ± z·se, matching :func:coefplot).

0.95
title str

Plot title. Default "Coefficient plot".

None
xlabel str

x-axis label.

``"Coefficient estimate"``
standalone bool

When True, wrap the tikzpicture in a compilable standalone document (with the required \usepackage{pgfplots}). Otherwise return just the tikzpicture to \input into a paper (needs \usepackage{pgfplots} in the preamble).

False

Returns:

Type Description
str

pgfplots source.

Examples:

>>> import statspai as sp
>>> m1 = sp.regress("y ~ x + z", data=df)
>>> tikz = sp.coefplot_tikz(m1, coef_labels={"x": "Treatment"})
>>> open("coefplot.tex", "w").write(tikz)

balance_table

balance_table(data: DataFrame, treat: str, covariates: List[str], output: str = 'text', title: str = 'Balance Table', fmt: str = '%.3f', labels: Optional[Dict[str, str]] = None, test: str = 'ttest') -> Union[str, DataFrame]

Generate a balance table comparing treated and control groups.

Standard Table 1 for matching, DID, and RCT papers.

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
treat str

Binary treatment variable (0/1).

required
covariates list of str

Variables to check balance on.

required
output str

'text', 'latex', 'html', 'dataframe', or filepath.

'text'
title str

Table title.

'Balance Table'
fmt str

Number format.

'%.3f'
labels dict

Variable labels.

None
test str

Test for difference: 'ttest' or 'ranksum'.

'ttest'

Returns:

Type Description
str or DataFrame

Examples:

>>> sp.balance_table(df, treat='treated',
...                  covariates=['age', 'edu', 'income'],
...                  output='balance.docx')

collect

collect(title: Optional[str] = None, *, template: str = 'aer') -> Collection

Construct a fresh :class:Collection.

Convenience factory mirroring Stata 15's collect workflow:

import statspai as sp c = sp.collect("Wage analysis") c.add_regression(m1, m2, name="main") c.add_summary(df, vars=["wage", "educ"], name="desc") c.save("paper.docx")

cite

cite(result, term: Optional[str] = None, *, fmt: str = '%.3f', output: str = 'text', star_levels: Tuple[float, ...] = (0.1, 0.05, 0.01), second_row: str = 'se', alpha: float = 0.05, bold_estimate: bool = False) -> str

Format a single coefficient as an inline citation string.

Parameters:

Name Type Description Default
result object

Any StatsPAI result with either params/std_errors (econometric) or estimate/se (causal).

required
term str

Coefficient name. Defaults to the headline estimand for causal results, or the first row of params for econometric results.

None
fmt str

printf-style format string.

``"%.3f"``
output str

One of "text", "latex", "markdown"/"md", "html".

``"text"``
star_levels tuple of float

Star thresholds — same convention as :func:sp.regtable.

``(0.10, 0.05, 0.01)``
second_row str

What to put in parentheses after the estimate. One of:

  • "se" — standard error in (...).
  • "t" — t-statistic in (...).
  • "p" — p-value in (...).
  • "ci" — confidence interval in [lo, hi].
  • "none" — omit the second row entirely.
``"se"``
alpha float

CI level when second_row="ci".

0.05
bold_estimate bool

For output="text" / "latex", whether to bold the estimate (HTML / Markdown bold the estimate by default for readability).

False

Returns:

Type Description
str

The formatted inline citation string.

list_journal_templates

list_journal_templates() -> Tuple[str, ...]

Return the canonical names of every registered journal preset.

get_journal_template

get_journal_template(name: str) -> Dict[str, Any]

Look up a journal preset by name (case-insensitive).

Parameters:

Name Type Description Default
name str

Template identifier (e.g. "aer" / "AER" / "jf").

required

Returns:

Type Description
dict

A copy of the preset entry. Callers are free to mutate it.

Raises:

Type Description
ValueError

If name is not a registered template.

attach_provenance

attach_provenance(result: Any, *, function: str, params: Optional[Mapping[str, Any]] = None, data: Optional[Any] = None, enabled: bool = True, overwrite: bool = False) -> Any

Attach a :class:Provenance record as result._provenance.

Parameters:

Name Type Description Default
result object

The estimator result. Must accept attribute assignment; CausalResult / ResultBase / dataclasses / SimpleNamespace all work. Tuples / dicts / immutable types do not — for those the call is a silent no-op.

required
function str

Logical name of the producing call, e.g. "statspai.did.callaway_santanna".

required
params mapping

Call arguments. Will be summarised (frames hashed; long sequences truncated; non-serialisable values reduced to repr).

None
data DataFrame / Series / ndarray

The estimator's input data. Used to compute a 12-char SHA-256 fingerprint.

None
enabled bool

Set to False to skip provenance entirely (zero-overhead path).

True
overwrite bool

If False (default) and result._provenance already exists, do nothing — preserves the first (most-specific) record set by an inner estimator.

False

Returns:

Name Type Description
result same object

Returned for chaining: return attach_provenance(res, ...).

Notes

Failures are swallowed. Provenance must never break the caller — if attribute assignment isn't possible, we no-op and move on.

get_provenance

get_provenance(result: Any) -> Optional[Provenance]

Return result._provenance if present, else None.

Walks one level of common containers (dict, list, tuple) — useful when an estimator returns a tuple (result, diagnostics).

compute_data_hash

compute_data_hash(data: Any, length: int = 12) -> Optional[str]

Return a short SHA-256 fingerprint of data, or None.

Accepts: - pandas.DataFrame — order- and column-name-sensitive hash. - pandas.Series — hashed via hash_pandas_object. - numpy.ndarray — bytes-hashed (shape-included). - bytes / bytearray — direct SHA-256.

Anything else returns None rather than raising — provenance must never break the calling estimator.

format_provenance

format_provenance(prov: Provenance, *, indent: int = 2) -> str

Pretty multi-line rendering of a :class:Provenance record.

lineage_summary

lineage_summary(*results: Any) -> Dict[str, Any]

Aggregate a lineage report across multiple results.

Useful for sp.replication_pack / Quarto appendix generation: pass every fitted result the paper depends on and get back a {run_id: provenance_dict} map plus a deduped list of input data hashes.

replication_pack

replication_pack(target: Any, output_path: Union[str, PathLike], *, data: Optional[Any] = None, code: Optional[str] = None, env: bool = True, bib: bool = True, paper_format: str = 'auto', title: str = 'Replication Pack', extra_files: Optional[Mapping[str, Union[str, bytes]]] = None, include_git_sha: bool = True, overwrite: bool = True) -> ReplicationPack

Build a replication archive.

Parameters:

Name Type Description Default
target object

Anything carrying analysis state. Best when it's a :class:statspai.workflow.paper.PaperDraft (we then auto-extract the rendered paper, the workflow's data, and any results with provenance) but plain estimator results, lists thereof, or even None (for "just pack data + script") all work.

required
output_path str or PathLike

Destination .zip path. Created or overwritten.

required
data DataFrame / Series

Explicit dataset. When omitted, we try target.data / target.workflow.data. If both fail, the archive omits the data/ directory and warns in MANIFEST.json.

None
code str or PathLike

Either an inline Python script (multi-line string) or a path to a .py file. When omitted, code/ is also omitted and a warning is logged.

None
env bool

Include env/requirements.txt from pip freeze. Disable to keep the archive small or to avoid the subprocess call.

True
bib bool

Write paper/paper.bib from target.citations (or equivalent).

True
paper_format ('auto', 'md', 'qmd', 'tex', 'docx')

How to render the PaperDraft inside the archive. "auto" picks draft.fmt (and falls back to "md" for unknown formats).

"auto"
title str

Used in README.md.

'Replication Pack'
extra_files mapping

{"path/in/zip.txt": "contents" or b"bytes"} — anything custom you want stuffed into the archive.

None
include_git_sha bool

Capture git rev-parse HEAD for MANIFEST.json (silently skipped when not in a repo).

True
overwrite bool

Overwrite an existing archive at output_path.

True

Returns:

Type Description
ReplicationPack

Summary object. rp.output_path is the on-disk archive; rp.manifest is the parsed MANIFEST.json; rp.warnings lists any partial-success notes.

Examples:

>>> import statspai as sp
>>> draft = sp.paper(df, "effect of training on wages",
...                  treatment="trained", y="wage")
>>> rp = sp.replication_pack(draft, "training-replication.zip")
>>> print(rp.summary())

to_gt

to_gt(result: Any, *, title: Optional[str] = None, subtitle: Optional[str] = None, notes: Optional[Sequence[str]] = None, template: Optional[str] = None, rowname_col: Optional[str] = None, apply_theme: bool = True) -> 'gt_pkg.GT | list'

Convert a StatsPAI table / DataFrame / Collection into a great_tables.GT.

Dispatches on the input type:

  • :class:statspai.output.RegtableResult — full-fidelity adapter that picks up the table's rendered cells, journal preset, title, and footer notes.
  • :class:statspai.output.PaperTables — flattens panels into a single GT with tab_row_group per panel.
  • :class:statspai.output.Collection — converts each convertible item and returns a list[GT]; callables include RegtableResult, MeanComparisonResult, and any object with to_dataframe().
  • :class:pandas.DataFrame — wraps verbatim. rowname_col promotes a column to row labels.
  • Anything else with a to_dataframe() method — calls it.

Parameters:

Name Type Description Default
result object

See dispatch description above.

required
title str

Override the table title. Default: pulled from result.title when present.

None
subtitle str

Subtitle line (tab_header(subtitle=...)).

None
notes sequence of str

Footer notes. When omitted, result.notes is used.

None
template str

Journal preset ("aer" / "qje" / …). Default: pulled from result.template when present.

None
rowname_col str

Column to elevate to GT's row label position. For RegtableResult we default to the variable column automatically; for plain DataFrames, pass the column name.

None
apply_theme bool

Apply the journal-preset gt theme. Set False to keep gt's defaults (useful when the caller wants to compose their own tab_style chain).

True

Returns:

Type Description
GT

A fresh GT instance. Chain .tab_style(...), .opt_*, .tab_spanner(...), etc. as you would in pure great_tables.

Examples:

>>> import statspai as sp
>>> rt = sp.regtable(model, template="aer", title="Returns to Schooling")
>>> g = sp.gt(rt)               # ready-to-render GT
>>> g.as_raw_html()             # for HTML export / Quarto
>>> g.as_latex()                # for LaTeX export
>>> # Plain DataFrame path:
>>> import pandas as pd
>>> df = pd.DataFrame({"var": ["x", "y"], "M1": ["0.5***", "0.3"]})
>>> sp.gt(df, rowname_col="var", title="Custom table")

Raises:

Type Description
ImportError

If great_tables is not installed. Install via pip install great_tables.

TypeError

If result is not adaptable to a DataFrame.

is_great_tables_available

is_great_tables_available() -> bool

Return True iff great_tables can be imported in this env.

csl_url

csl_url(name: str) -> str

Return the canonical Zotero/styles URL for a CSL preset.

Use the URL once at project setup:

.. code-block:: bash

curl -O $(python -c "import statspai as sp; print(sp.csl_url('aer'))")
# → american-economic-association.csl in the current directory

Then point Quarto at the local copy via csl: paper-style.csl in the YAML header. PaperDraft.to_qmd(csl='aer') does the local filename pass-through automatically.

Raises:

Type Description
ValueError

Unknown short name. Use :func:list_csl_styles to enumerate.

csl_filename

csl_filename(name: str) -> str

Return the canonical .csl filename (no path) for a preset.

Useful when emitting a csl: ... line into a Quarto YAML header where the user has already downloaded the style file alongside paper.qmd.

list_csl_styles

list_csl_styles() -> List[Tuple[str, str]]

List (short_name, full_label) pairs for every registered style.

parse_citation_to_bib

parse_citation_to_bib(citation: str, key: Optional[str] = None) -> Dict[str, Any]

Parse a citation string into a BibTeX-shaped dict.

Returns a dict with at least key, type (article / misc), and as many of (author, year, title, journal) as the regex can extract.

For full-fidelity bibliographies, write your paper.bib by hand or via Zotero — this is a "quick-start" parser, deliberately conservative.

make_bib_key

make_bib_key(citation: str) -> str

Compute a stable BibTeX key from a free-form citation string.

Format: firstauthor + year + first-title-word, e.g. "callaway2021difference". Falls back to a hash-derived key when we can't parse author+year.

citations_to_bib_entries

citations_to_bib_entries(citations: Iterable[str]) -> List[Dict[str, Any]]

Parse a sequence of citation strings into BibTeX-entry dicts.

Deduplicates by key — the first occurrence wins (matches the replication_pack semantics where inner estimators register citations before outer wrappers).

write_bib

write_bib(citations: Iterable[Union[str, Dict[str, Any]]], path: Union[str, Path], *, append: bool = False, header: bool = True) -> Path

Write a clean BibTeX file from citation strings or entry dicts.

Parameters:

Name Type Description Default
citations iterable

Either free-form citation strings (parsed via :func:parse_citation_to_bib) or pre-built entry dicts {"key": ..., "type": ..., "fields": {...}}.

required
path str or Path

Destination .bib file. Parent dirs are created.

required
append bool

Append to an existing file rather than overwriting.

False
header bool

Prepend a one-line % Auto-generated by StatsPAI ... comment at the top of fresh files (skipped on append).

True

Returns:

Type Description
Path

Resolved path of the written file.

Notes

Deduplicates by computed bib key. Pre-built entry dicts are taken as-is; only string citations go through the regex parser.