statspai.output¶
output ¶
Output utilities for regression and causal-inference results.
The package is organised by purpose:
Regression-table renderers (4 entry points historically — see PR-B
design doc; regtable is the canonical one):
- :func:
regtable— canonical multi-model regression table renderer. Supports text / HTML / LaTeX / Markdown / Quarto / Excel / Word / DataFrame, journal templates, multi-row SE, repro provenance. - :func:
esttab— Stataestout/esttabclone (useeststoto register, thenesttabto print). Thin Stata-flavoured surface. - :func:
modelsummary— Rmodelsummaryclone (functional API). - :func:
outreg2/ :class:OutReg2— Stataoutreg2clone (Excel-first surface).
Single-table helpers:
- :func:
tab— Stata-styletabulate. - :func:
sumstats— descriptive summary statistics. - :func:
balance_table— covariate-balance table. - :func:
mean_comparison— two-group mean comparison with t-test / ranksum / chi2 (lives inmean_comparison.pysince v1.6.x — re-exported fromregression_tablefor back-compat).
Multi-table / paper bundles:
- :func:
paper_tables— Main / Heterogeneity / Robustness panels. - :class:
Collection/ :func:collect— narrative document builder.
Plotting:
- :func:
coefplot— coefficient plot.
Provenance / replication / citations:
- :class:
Provenance, :func:attach_provenance, :func:get_provenance, :func:compute_data_hash, :func:format_provenance, :func:lineage_summary. - :class:
ReplicationPack, :func:replication_pack. - :func:
cite, :data:CSL_REGISTRY, :func:csl_url, ...
Adapters:
- :func:
to_gt, :func:is_great_tables_available—great_tablesadapter (lazy).
RegtableResult ¶
Rich result object for regression tables with multi-format export.
to_latex ¶
to_latex(*, siunitx: bool = False, threeparttable: bool = False, siunitx_preamble: bool = False) -> str
Render the table as a LaTeX table float.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
siunitx
|
bool
|
Decimal-align the numeric columns with |
False
|
threeparttable
|
bool
|
Wrap the table in |
False
|
siunitx_preamble
|
bool
|
Prepend a comment line listing the required |
False
|
to_markdown ¶
Render the table as Markdown.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
quarto
|
bool
|
When |
False
|
to_quarto ¶
Render as a Quarto-cross-referenceable Markdown table.
Builds on :meth:to_markdown and appends a Quarto caption block
of the form::
: <caption> {#tbl-<label>}
which lets the manuscript reference the table via
@tbl-<label>. The tbl- prefix is auto-prepended when the
user passes a bare quarto_label="main".
Behaviour
quarto_labelis required. Without it,ValueErroris raised — Quarto cross-references need an id.quarto_captionfalls back totitlewhen not provided. If neither is set, a generic"Regression results"is used and a warning is emitted.- The leading title line is dropped (the caption block replaces it) to avoid duplicating the heading.
to_dict ¶
Return a JSON-safe dict representation of the table.
The package is agent-native (CLAUDE.md §1): a rendered regression table is a first-class artifact an LLM tool loop should be able to serialise, cache, and reason over without re-rendering. The payload carries three layers:
- metadata —
model_labels,dep_var_labels,panel_labels,title,notes,template,se_type,stars/star_levels,requested_stats,coef_labels. - table — the rendered cell grid (the same strings
:meth:
to_dataframeproduces:"2.067***","(0.074)"…), as a list of{"term": ..., <model label>: <cell>}records. - models — the numeric truth per model: coefficient estimates,
standard errors, t / p values, confidence bounds, summary stats and
the dependent variable. Use this layer for machine reasoning; use
tablefor faithful re-display.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
renders
|
bool | sequence of str
|
When truthy, also embed fully rendered strings under
|
None
|
Returns:
| Type | Description |
|---|---|
dict
|
JSON-safe; round-trips through |
Examples:
to_json ¶
Serialise :meth:to_dict via json.dumps.
from_dict
classmethod
¶
Reconstruct a :class:RegtableResult from a :meth:to_dict payload.
The inverse of :meth:to_dict. Rebuilds one normalised model per
entry in the models layer (coefficient estimates, SE, t, p, CI,
summary stats, dependent variable), re-splits them into panels per
render_spec.panel_sizes, and restores the render-controlling
metadata (labels, fmt, se_type, stars / star levels, stats,
keep / drop / order, add_rows, alpha, template).
For a table built without exotic options this round-trips exactly:
import statspai as sp t = sp.regtable(m1, m2, template="aer") # doctest: +SKIP RegtableResult.from_dict(t.to_dict()).to_latex() == t.to_latex() True
Notes
Exotic features that are not part of the to_dict payload —
stacked multi_se rows, eform transforms, column_spanners,
tests rows, and custom apply_coef transforms — are NOT
preserved across the round-trip (they reconstruct as a plain table).
The serialised table / renders layers already capture their
rendered form if you only need to re-display, not re-compute.
to_excel ¶
Export table to Excel as a strict book-tab three-line table.
Uses the shared _excel_style primitives so the visual output
is byte-aligned with sumstats, tab, paper_tables,
collection, modelsummary and outreg2: thick top rule
above the column header, thin mid rule between header and body,
thick bottom rule below the last data row, Times New Roman
throughout.
to_word ¶
Export table to Word (.docx) file in AER/QJE book-tab style.
The exported document follows economics-journal conventions: a heavy top rule, thin mid rule below the header, heavy bottom rule above notes, and no internal vertical borders. Body text is Times New Roman 10pt; the notes paragraph is 8pt italic.
to_docx ¶
Alias for :meth:to_word — mirrors Stata outreg2 convention.
EstimateTableResult ¶
Stata esttab result handle — thin wrapper over a :class:RegtableResult.
Preserves the EstimateTableResult type identity for callers that
do isinstance(x, EstimateTableResult). Forwards every render
method to the underlying regtable result; adds to_csv() for
parity with the legacy esttab API (regtable does not natively
expose CSV but the dataframe path is byte-identical to what the
legacy esttab produced).
MeanComparisonResult ¶
Rich result object for balance / mean comparison tables.
PaperTables
dataclass
¶
Container for the multi-panel paper-table bundle.
Each attribute is a RegtableResult (has .text, .latex, .html,
.to_latex(), ...). Iterate via .panels() or access by name:
pt.main / pt.heterogeneity / pt.robustness / pt.placebo.
panels ¶
panels() -> Dict[str, RegtableResult]
Return {name: RegtableResult} for every populated panel.
to_latex ¶
Concatenate every panel's LaTeX into a single file.
If path is provided, also write to disk.
to_markdown ¶
Return all panels stacked as GitHub-flavoured Markdown.
to_dict ¶
Return a JSON-safe dict representation of the multi-panel bundle.
Agent-native counterpart to the renderers: each populated panel
(main / heterogeneity / robustness / placebo) carries
the full :meth:RegtableResult.to_dict payload (metadata + rendered
grid + numeric truth), so an LLM tool loop can cache and reason over
the whole bundle without re-rendering.
to_docx ¶
Write every populated panel into a single .docx file.
Each panel renders as an AER/QJE book-tab table (heavy top rule, thin mid rule, heavy bottom rule, no internal borders). Panels are separated by a page break so the file lands on the co-author's desk ready for direct insertion into a manuscript.
Returns the file path that was written.
to_xlsx ¶
Write every populated panel as a separate sheet in one workbook.
Sheet names are the panel names (main / heterogeneity /
robustness / placebo). Each sheet uses the AER book-tab
style: heavy rules above the header, below the header, and below
the last data row.
Returns the file path that was written.
Collection ¶
A named, ordered bundle of tables and prose for a single document.
Add items in any order via the add_* methods (each returns
self so calls can be chained). Render to any supported format
via save(path) or one of the explicit to_* methods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
title
|
str
|
Displayed at the top of the rendered document. |
None
|
template
|
('aer', 'qje', 'econometrica', 'restat')
|
Forwarded to |
'aer'
|
to_frame ¶
Return a semantic long-format view of the collection.
This is the programmatic counterpart to Stata's collect layout
and R's modelsummary/gt data pipeline: every rendered cell
is represented as one row with stable dimensions
item / kind / term / statistic / model and both
a raw numeric value (when parseable) and the display
formatted string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_text
|
bool
|
Include free-form text and headings as rows. Table workflows usually leave this off; document-audit workflows may want it. |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Long-format cell table with provenance-friendly dimensions. |
to_csv ¶
Render :meth:to_frame to CSV; optionally write to path.
to_dict ¶
Return a JSON-safe dict representation of the whole document.
Agent-native counterpart to :meth:save — every item (regression
table, balance / summary table, free text) is serialised in order so
an LLM tool loop can cache and reason over a multi-table document
without re-rendering. Regression-table items carry the full
:meth:RegtableResult.to_dict payload (metadata + rendered grid +
numeric truth).
add_regression ¶
add_regression(*results, name: Optional[str] = None, title: Optional[str] = None, **regtable_kwargs) -> 'Collection'
Add a regression table built from one or more model results.
regtable_kwargs are forwarded verbatim to sp.regtable.
add_table ¶
Add an already-built RegtableResult / MeanComparisonResult.
add_summary ¶
add_summary(data: DataFrame, vars: Optional[Sequence[str]] = None, *, stats: Optional[Sequence[str]] = None, name: Optional[str] = None, title: Optional[str] = None, labels: Optional[Dict[str, str]] = None) -> 'Collection'
Add a descriptive-statistics table built from a DataFrame.
Stores the underlying DataFrame; rendering re-uses the existing
sumstats formatters so the AER book-tab style applies.
add_balance ¶
add_balance(data: DataFrame, treatment: str, variables: Sequence[str], *, weights: Optional[str] = None, test: str = 'ttest', name: Optional[str] = None, title: Optional[str] = None, fmt: str = '%.3f') -> 'Collection'
Add a treatment vs. control balance table (calls mean_comparison).
add_text ¶
Add a free-form text block (rendered as a paragraph).
add_heading ¶
Add a section heading (level 1-3).
to_markdown ¶
Render to GitHub-flavoured Markdown; optionally write to path.
to_html ¶
Render to a single self-contained HTML document.
to_latex ¶
Concatenate every item's LaTeX into one .tex file.
to_docx ¶
Write the entire collection to a single .docx file.
Each item renders in turn — headings as Word headings, text as paragraphs, tables in AER book-tab style — separated by page breaks between tables.
CollectionItem
dataclass
¶
One entry in a :class:Collection.
Provenance
dataclass
¶
A traceable record of how a single estimate was produced.
Attributes:
| Name | Type | Description |
|---|---|---|
function |
str
|
Fully qualified function name (e.g. |
params |
dict
|
JSON-serialisable summary of the call arguments. |
data_hash |
str or None
|
12-char SHA-256 prefix of the input data, or None when the data was too large to hash (>1M rows) or wasn't a recognised type. |
data_shape |
list[int] or None
|
|
run_id |
str
|
Per-call uuid4 — disambiguates two structurally identical calls in the same session. |
statspai_version |
str
|
Package version at the time of the call. |
python_version |
str
|
|
timestamp |
str
|
ISO-8601 wall-clock of when the call returned. |
ReplicationPack ¶
Lightweight summary returned by :func:replication_pack.
Mostly for testing / programmatic inspection; users typically just
care about output_path.
regtable ¶
regtable(*args, panel_labels: Optional[List[str]] = None, coef_labels: Optional[Dict[str, str]] = None, dep_var_labels: Optional[List[str]] = None, model_labels: Optional[List[str]] = None, keep: Optional[Sequence[str]] = None, drop: Optional[Sequence[str]] = None, order: Optional[Sequence[str]] = None, stats: Optional[Sequence[str]] = None, se_type: str = 'se', stars: bool = True, star_levels: Optional[Tuple[float, ...]] = None, fmt: str = '%.3f', output: str = 'text', filename: Optional[str] = None, title: Optional[str] = None, notes: Optional[List[str]] = None, add_rows: Optional[Dict[str, List[str]]] = None, alpha: float = 0.05, template: Optional[str] = None, diagnostics: Union[str, bool] = 'auto', multi_se: Optional[Dict[str, Sequence[Any]]] = None, repro: Union[bool, Dict[str, Any], None] = None, quarto_label: Optional[str] = None, quarto_caption: Optional[str] = None, eform: Union[bool, Sequence[bool]] = False, column_spanners: Optional[Sequence[Tuple[str, int]]] = None, coef_map: Optional[Dict[str, str]] = None, consistency_check: bool = True, estimate: Optional[str] = None, statistic: Optional[str] = None, notation: Union[str, Tuple[str, ...]] = 'stars', apply_coef: Optional[Any] = None, apply_coef_deriv: Optional[Any] = None, escape: bool = True, tests: Optional[Dict[str, Sequence[Any]]] = None, fixef_sizes: bool = False, vcov: Optional[str] = None, transpose: bool = False) -> RegtableResult
Unified publication-quality regression table.
Accepts model results as positional arguments. If the first argument is a list, each list is treated as a separate panel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*args
|
model results or lists of model results
|
|
()
|
panel_labels
|
list of str
|
Labels for each panel (e.g., |
None
|
coef_labels
|
dict
|
Rename variables: |
None
|
dep_var_labels
|
list of str
|
Dependent variable labels shown below column headers. |
None
|
model_labels
|
list of str
|
Column header labels. Defaults to |
None
|
keep
|
list of str
|
Only show these variables. |
None
|
drop
|
list of str
|
Hide these variables. |
None
|
order
|
list of str
|
Reorder variables. |
None
|
stats
|
list of str
|
Summary statistics. Defaults to |
None
|
se_type
|
str
|
What to show beneath coefficients: |
``"se"``
|
stars
|
bool
|
Append significance stars. |
True
|
star_levels
|
tuple
|
Thresholds for |
``(0.10, 0.05, 0.01)``
|
fmt
|
str
|
Format string for numeric values. Pass any C-style format
( |
``"%.3f"``
|
output
|
str
|
Controls what |
``"text"``
|
filename
|
str
|
Save the table to this file path. The format is chosen from the
file extension ( |
None
|
quarto_label
|
str
|
Quarto cross-reference id. Pass |
None
|
quarto_caption
|
str
|
Caption rendered alongside the Quarto cross-ref id. Falls back
to |
None
|
title
|
str
|
Table title / caption. |
None
|
notes
|
list of str
|
Additional notes beneath the table. |
None
|
add_rows
|
dict
|
Custom rows: |
None
|
alpha
|
float
|
Significance level used when |
0.05
|
template
|
str
|
Journal preset name. One of |
None
|
diagnostics
|
('auto', 'off')
|
Auto-extract publication-quality diagnostic rows from the result objects:
|
'auto'
|
multi_se
|
dict
|
Stack additional SE specifications under the primary SE row.
Keys are display labels (e.g. |
None
|
repro
|
bool or dict
|
Append a reproducibility metadata note (StatsPAI version, optional
seed and data hash, timestamp) as the last footer line. |
None
|
eform
|
bool or list of bool
|
Report exponentiated coefficients — odds ratios for |
``False``
|
column_spanners
|
list of (label, span)
|
Multi-row header above the model labels — each tuple groups
|
None
|
coef_map
|
dict
|
Single-shot rename + reorder + drop. Mirrors R
|
None
|
consistency_check
|
bool
|
When two or more columns are passed and their sample sizes
differ, emit a |
True
|
estimate
|
str
|
Custom format string for the top (coefficient) line in
each cell. Mirrors R |
None
|
statistic
|
str
|
Custom format string for the bottom (statistic) line in
each cell. Same placeholders as |
None
|
notation
|
``"stars"`` | ``"symbols"`` | tuple of 3 strings
|
Family of significance markers used when |
'stars'
|
apply_coef
|
callable
|
Apply an arbitrary transformation |
None
|
apply_coef_deriv
|
callable
|
Derivative |
None
|
escape
|
bool
|
Auto-escape user-supplied label strings ( |
True
|
tests
|
dict
|
Render hypothesis-test rows in the diagnostic strip below the
stats block. Keys are display labels ("F-test x1=0",
"Hansen J p-value", "Wald χ²"); values are sequences whose
length equals
Stars honour the configured |
None
|
fixef_sizes
|
bool
|
Auto-emit "# Firm: 1,234" / "# Year: 30" rows showing the
number of distinct levels per fixed effect. Reads
|
False
|
vcov
|
str
|
Recompute the SE / t / p / 95% CI columns at print time using
a different variance estimator. Currently supports OLS-style
results that store
Columns whose underlying result lacks the X/residuals fields
emit a |
None
|
transpose
|
bool
|
Render with axes swapped: rows become models, columns become
variables. Single-panel only; multi-panel input or
|
False
|
Returns:
| Type | Description |
|---|---|
RegtableResult
|
Object with |
Examples:
>>> import statspai as sp
>>> m1 = sp.regress("y ~ x1", data=df)
>>> m2 = sp.regress("y ~ x1 + x2", data=df)
>>> sp.regtable(m1, m2)
>>> sp.regtable(m1, m2, output="latex", filename="table1.tex")
>>> sp.regtable([m1, m2], [m3, m4],
... panel_labels=["Panel A: OLS", "Panel B: IV"])
>>>
>>> # Logit odds ratios
>>> sp.regtable(sp.logit("y ~ x", data=df), eform=True)
>>>
>>> # IV three-block table with column spanners
>>> sp.regtable(
... ols1, ols2, iv1, iv2,
... column_spanners=[("OLS", 2), ("IV", 2)],
... stats=["N", "R2", "depvar_mean", "depvar_sd"],
... )
>>>
>>> # Unified coef_map (rename + order + drop in one shot)
>>> sp.regtable(m1, m2, coef_map={
... "x2": "Education",
... "x1": "Experience",
... "Intercept": "Constant",
... })
eststo ¶
Store a model result (like Stata's estimates store).
esttab ¶
esttab(*results, names: Optional[Sequence[str]] = None, se: bool = True, t: bool = False, p: bool = False, ci: bool = False, stars: bool = True, star_levels: Tuple[float, ...] = (0.1, 0.05, 0.01), keep: Optional[Sequence[str]] = None, drop: Optional[Sequence[str]] = None, order: Optional[Sequence[str]] = None, labels: Optional[Dict[str, str]] = None, stats: Optional[Sequence[str]] = None, fmt: str = '%.4f', output: str = 'text', filename: Optional[str] = None, title: Optional[str] = None, notes: Optional[Sequence[str]] = None, alpha: float = 0.05) -> EstimateTableResult
Stata-style esttab — thin facade over :func:sp.regtable.
.. deprecated::
Now a thin wrapper over :func:statspai.output.regtable. Use
sp.regtable(*models, ...) directly for full control. See
the module docstring for the parameter mapping.
Accepts model results directly as positional arguments or reads
from the global store populated by :func:eststo. If both
positional arguments and a non-empty store exist, positional
arguments take precedence.
Parameters mirror the original esttab API; see the module
docstring for the exact mapping to regtable.
coefplot ¶
coefplot(*models, model_names: Optional[List[str]] = None, variables: Optional[List[str]] = None, ax=None, figsize: tuple = (8, 6), colors: Optional[List[str]] = None, title: Optional[str] = None, alpha: float = 0.05)
Forest plot comparing coefficients across models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*models
|
Model result objects. |
()
|
|
model_names
|
list of str
|
|
None
|
variables
|
list of str
|
Which variables to plot. Default: all shared variables. |
None
|
ax
|
matplotlib Axes
|
|
None
|
figsize
|
tuple
|
|
(8, 6)
|
colors
|
list of str
|
|
None
|
title
|
str
|
|
None
|
alpha
|
float
|
Significance level for CIs. |
0.05
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
coefplot_tikz ¶
coefplot_tikz(*models, model_names: Optional[List[str]] = None, variables: Optional[List[str]] = None, coef_labels: Optional[Dict[str, str]] = None, level: float = 0.95, title: Optional[str] = None, xlabel: str = 'Coefficient estimate', standalone: bool = False) -> str
Return pgfplots / TikZ source for a coefficient forest plot.
The vector-graphics, LaTeX-native counterpart to :func:coefplot (which
returns a Matplotlib (fig, ax) you can fig.savefig("plot.pdf") /
.png). Each model becomes one \addplot series of point estimates
with horizontal confidence-interval error bars; variables run down the
y-axis with a dashed reference line at zero — the same forest layout as
:func:coefplot, emitted as editable LaTeX.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*models
|
Model result objects ( |
()
|
|
model_names
|
list of str
|
Legend labels. Default |
None
|
variables
|
list of str
|
Which coefficients to plot. Default: every shared variable, sorted. |
None
|
coef_labels
|
dict
|
Rename variables on the y-axis, e.g. |
None
|
level
|
float
|
Confidence level for the error bars (normal approximation
|
0.95
|
title
|
str
|
Plot title. Default |
None
|
xlabel
|
str
|
x-axis label. |
``"Coefficient estimate"``
|
standalone
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
str
|
|
Examples:
balance_table ¶
balance_table(data: DataFrame, treat: str, covariates: List[str], output: str = 'text', title: str = 'Balance Table', fmt: str = '%.3f', labels: Optional[Dict[str, str]] = None, test: str = 'ttest') -> Union[str, DataFrame]
Generate a balance table comparing treated and control groups.
Standard Table 1 for matching, DID, and RCT papers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data. |
required |
treat
|
str
|
Binary treatment variable (0/1). |
required |
covariates
|
list of str
|
Variables to check balance on. |
required |
output
|
str
|
'text', 'latex', 'html', 'dataframe', or filepath. |
'text'
|
title
|
str
|
Table title. |
'Balance Table'
|
fmt
|
str
|
Number format. |
'%.3f'
|
labels
|
dict
|
Variable labels. |
None
|
test
|
str
|
Test for difference: 'ttest' or 'ranksum'. |
'ttest'
|
Returns:
| Type | Description |
|---|---|
str or DataFrame
|
|
Examples:
collect ¶
collect(title: Optional[str] = None, *, template: str = 'aer') -> Collection
Construct a fresh :class:Collection.
Convenience factory mirroring Stata 15's collect workflow:
import statspai as sp c = sp.collect("Wage analysis") c.add_regression(m1, m2, name="main") c.add_summary(df, vars=["wage", "educ"], name="desc") c.save("paper.docx")
cite ¶
cite(result, term: Optional[str] = None, *, fmt: str = '%.3f', output: str = 'text', star_levels: Tuple[float, ...] = (0.1, 0.05, 0.01), second_row: str = 'se', alpha: float = 0.05, bold_estimate: bool = False) -> str
Format a single coefficient as an inline citation string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
object
|
Any StatsPAI result with either |
required |
term
|
str
|
Coefficient name. Defaults to the headline |
None
|
fmt
|
str
|
|
``"%.3f"``
|
output
|
str
|
One of |
``"text"``
|
star_levels
|
tuple of float
|
Star thresholds — same convention as :func: |
``(0.10, 0.05, 0.01)``
|
second_row
|
str
|
What to put in parentheses after the estimate. One of:
|
``"se"``
|
alpha
|
float
|
CI level when |
0.05
|
bold_estimate
|
bool
|
For |
False
|
Returns:
| Type | Description |
|---|---|
str
|
The formatted inline citation string. |
list_journal_templates ¶
Return the canonical names of every registered journal preset.
get_journal_template ¶
Look up a journal preset by name (case-insensitive).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Template identifier (e.g. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
A copy of the preset entry. Callers are free to mutate it. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If name is not a registered template. |
attach_provenance ¶
attach_provenance(result: Any, *, function: str, params: Optional[Mapping[str, Any]] = None, data: Optional[Any] = None, enabled: bool = True, overwrite: bool = False) -> Any
Attach a :class:Provenance record as result._provenance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
object
|
The estimator result. Must accept attribute assignment;
|
required |
function
|
str
|
Logical name of the producing call, e.g.
|
required |
params
|
mapping
|
Call arguments. Will be summarised (frames hashed; long sequences truncated; non-serialisable values reduced to repr). |
None
|
data
|
DataFrame / Series / ndarray
|
The estimator's input data. Used to compute a 12-char SHA-256 fingerprint. |
None
|
enabled
|
bool
|
Set to False to skip provenance entirely (zero-overhead path). |
True
|
overwrite
|
bool
|
If False (default) and |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
result |
same object
|
Returned for chaining: |
Notes
Failures are swallowed. Provenance must never break the caller — if attribute assignment isn't possible, we no-op and move on.
get_provenance ¶
get_provenance(result: Any) -> Optional[Provenance]
Return result._provenance if present, else None.
Walks one level of common containers (dict, list,
tuple) — useful when an estimator returns a tuple
(result, diagnostics).
compute_data_hash ¶
Return a short SHA-256 fingerprint of data, or None.
Accepts:
- pandas.DataFrame — order- and column-name-sensitive hash.
- pandas.Series — hashed via hash_pandas_object.
- numpy.ndarray — bytes-hashed (shape-included).
- bytes / bytearray — direct SHA-256.
Anything else returns None rather than raising — provenance
must never break the calling estimator.
format_provenance ¶
format_provenance(prov: Provenance, *, indent: int = 2) -> str
Pretty multi-line rendering of a :class:Provenance record.
lineage_summary ¶
Aggregate a lineage report across multiple results.
Useful for sp.replication_pack / Quarto appendix generation:
pass every fitted result the paper depends on and get back a
{run_id: provenance_dict} map plus a deduped list of input
data hashes.
replication_pack ¶
replication_pack(target: Any, output_path: Union[str, PathLike], *, data: Optional[Any] = None, code: Optional[str] = None, env: bool = True, bib: bool = True, paper_format: str = 'auto', title: str = 'Replication Pack', extra_files: Optional[Mapping[str, Union[str, bytes]]] = None, include_git_sha: bool = True, overwrite: bool = True) -> ReplicationPack
Build a replication archive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target
|
object
|
Anything carrying analysis state. Best when it's a
:class: |
required |
output_path
|
str or PathLike
|
Destination |
required |
data
|
DataFrame / Series
|
Explicit dataset. When omitted, we try |
None
|
code
|
str or PathLike
|
Either an inline Python script (multi-line string) or a path to
a .py file. When omitted, |
None
|
env
|
bool
|
Include |
True
|
bib
|
bool
|
Write |
True
|
paper_format
|
('auto', 'md', 'qmd', 'tex', 'docx')
|
How to render the PaperDraft inside the archive. "auto" picks
|
"auto"
|
title
|
str
|
Used in |
'Replication Pack'
|
extra_files
|
mapping
|
|
None
|
include_git_sha
|
bool
|
Capture |
True
|
overwrite
|
bool
|
Overwrite an existing archive at |
True
|
Returns:
| Type | Description |
|---|---|
ReplicationPack
|
Summary object. |
Examples:
to_gt ¶
to_gt(result: Any, *, title: Optional[str] = None, subtitle: Optional[str] = None, notes: Optional[Sequence[str]] = None, template: Optional[str] = None, rowname_col: Optional[str] = None, apply_theme: bool = True) -> 'gt_pkg.GT | list'
Convert a StatsPAI table / DataFrame / Collection into a great_tables.GT.
Dispatches on the input type:
- :class:
statspai.output.RegtableResult— full-fidelity adapter that picks up the table's rendered cells, journal preset, title, and footer notes. - :class:
statspai.output.PaperTables— flattens panels into a single GT withtab_row_groupper panel. - :class:
statspai.output.Collection— converts each convertible item and returns alist[GT]; callables includeRegtableResult,MeanComparisonResult, and any object withto_dataframe(). - :class:
pandas.DataFrame— wraps verbatim.rowname_colpromotes a column to row labels. - Anything else with a
to_dataframe()method — calls it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
object
|
See dispatch description above. |
required |
title
|
str
|
Override the table title. Default: pulled from
|
None
|
subtitle
|
str
|
Subtitle line ( |
None
|
notes
|
sequence of str
|
Footer notes. When omitted, |
None
|
template
|
str
|
Journal preset ( |
None
|
rowname_col
|
str
|
Column to elevate to GT's row label position. For
|
None
|
apply_theme
|
bool
|
Apply the journal-preset gt theme. Set False to keep gt's
defaults (useful when the caller wants to compose their own
|
True
|
Returns:
| Type | Description |
|---|---|
GT
|
A fresh GT instance. Chain |
Examples:
>>> import statspai as sp
>>> rt = sp.regtable(model, template="aer", title="Returns to Schooling")
>>> g = sp.gt(rt) # ready-to-render GT
>>> g.as_raw_html() # for HTML export / Quarto
>>> g.as_latex() # for LaTeX export
>>> # Plain DataFrame path:
>>> import pandas as pd
>>> df = pd.DataFrame({"var": ["x", "y"], "M1": ["0.5***", "0.3"]})
>>> sp.gt(df, rowname_col="var", title="Custom table")
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
TypeError
|
If |
is_great_tables_available ¶
Return True iff great_tables can be imported in this env.
csl_url ¶
Return the canonical Zotero/styles URL for a CSL preset.
Use the URL once at project setup:
.. code-block:: bash
curl -O $(python -c "import statspai as sp; print(sp.csl_url('aer'))")
# → american-economic-association.csl in the current directory
Then point Quarto at the local copy via csl: paper-style.csl in
the YAML header. PaperDraft.to_qmd(csl='aer') does the local
filename pass-through automatically.
Raises:
| Type | Description |
|---|---|
ValueError
|
Unknown short name. Use :func: |
csl_filename ¶
Return the canonical .csl filename (no path) for a preset.
Useful when emitting a csl: ... line into a Quarto YAML header
where the user has already downloaded the style file alongside
paper.qmd.
list_csl_styles ¶
List (short_name, full_label) pairs for every registered style.
parse_citation_to_bib ¶
Parse a citation string into a BibTeX-shaped dict.
Returns a dict with at least key, type (article /
misc), and as many of (author, year, title,
journal) as the regex can extract.
For full-fidelity bibliographies, write your paper.bib by
hand or via Zotero — this is a "quick-start" parser, deliberately
conservative.
make_bib_key ¶
Compute a stable BibTeX key from a free-form citation string.
Format: firstauthor + year + first-title-word, e.g.
"callaway2021difference". Falls back to a hash-derived key when
we can't parse author+year.
citations_to_bib_entries ¶
Parse a sequence of citation strings into BibTeX-entry dicts.
Deduplicates by key — the first occurrence wins (matches the
replication_pack semantics where inner estimators register
citations before outer wrappers).
write_bib ¶
write_bib(citations: Iterable[Union[str, Dict[str, Any]]], path: Union[str, Path], *, append: bool = False, header: bool = True) -> Path
Write a clean BibTeX file from citation strings or entry dicts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
citations
|
iterable
|
Either free-form citation strings (parsed via
:func: |
required |
path
|
str or Path
|
Destination |
required |
append
|
bool
|
Append to an existing file rather than overwriting. |
False
|
header
|
bool
|
Prepend a one-line |
True
|
Returns:
| Type | Description |
|---|---|
Path
|
Resolved path of the written file. |
Notes
Deduplicates by computed bib key. Pre-built entry dicts are taken as-is; only string citations go through the regex parser.