`statspai.bartik`¶

bartik ¶

Shift-Share (Bartik) Instrumental Variables for StatsPAI.

Constructs Bartik instruments from industry shares and national shocks, with diagnostics for instrument validity following Goldsmith-Pinkham, Sorkin, and Swift (2020) and Borusyak, Hull, and Jaravel (2022).

References

Goldsmith-Pinkham, P., Sorkin, I., and Swift, H. (2020). "Bartik Instruments: What, When, Why, and How." American Economic Review, 110(8), 2586-2624. [@goldsmithpinkham2020bartik]

Borusyak, K., Hull, P., and Jaravel, X. (2022). "Quasi-Experimental Shift-Share Research Designs." Review of Economic Studies, 89(1), 181-213. [@borusyak2022quasi]

BartikIV ¶

Bartik Shift-Share IV estimator.

Constructs the shift-share instrument B_i = sum_k s_ik * g_k from a region-by-industry share matrix and an industry shock vector, then runs 2SLS. The convenience wrapper :func:bartik builds this object and calls :meth:fit; use the class directly when you want to hold onto the estimator instance.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> from statspai.bartik.shift_share import BartikIV
>>> rng = np.random.default_rng(0)
>>> n, K = 60, 4
>>> shares = pd.DataFrame(
...     rng.dirichlet(np.ones(K), size=n),
...     columns=[f"ind{k}" for k in range(K)],
... )
>>> shocks = pd.Series(rng.normal(0.05, 0.02, K), index=shares.columns)
>>> emp = shares.values @ shocks.values + rng.normal(0, 0.01, n)
>>> wage = 1.5 * emp + rng.normal(0, 0.02, n)
>>> df = pd.DataFrame({"wage_growth": wage, "emp_growth": emp})
>>> est = BartikIV(df, y="wage_growth", endog="emp_growth",
...                shares=shares, shocks=shocks, leave_one_out=False)
>>> res = est.fit()
>>> round(float(res.params["emp_growth"]), 1)  # ~1.5
1.5

rotemberg_weights `property` ¶

rotemberg_weights: DataFrame

Rotemberg weight decomposition by industry.

fit ¶

fit() -> EconometricResults

Fit Bartik IV via 2SLS.

ShiftSharePoliticalResult `dataclass` ¶

Bases: ResultProtocolMixin

Structured output of :func:shift_share_political.

Wraps a standard :class:CausalResult (point + SEs) plus the two Park-Xu (2026) diagnostics: share-balance and Rotemberg top-K.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> units, inds = range(20), [f"I{k}" for k in range(5)]
>>> shares = pd.DataFrame(rng.dirichlet(np.ones(5), size=len(units)),
...                       index=list(units), columns=inds)
>>> shocks = pd.Series(rng.normal(size=5), index=inds)
>>> rows = []
>>> for i in units:
...     dx = float((shares.loc[i] * shocks).sum()) + rng.normal(scale=0.1)
...     rows.append({"unit": i, "time": 0, "y": 0.0, "x": 0.0})
...     rows.append({"unit": i, "time": 1, "y": 0.4 * dx, "x": dx})
>>> df = pd.DataFrame(rows)
>>> res = sp.shift_share_political(
...     df, unit="unit", time="time", outcome="y", endog="x",
...     shares=shares, shocks=shocks,
... )
>>> bool(np.isfinite(res.estimate))
True

ShiftSharePoliticalPanelResult `dataclass` ¶

Bases: ResultProtocolMixin

Structured output of :func:shift_share_political_panel.

Attributes:

Name	Type	Description
`estimate`	`float`	Pooled 2SLS coefficient on `endog`.
`se`	`float`	Panel-clustered SE (unit by default; shock-clustered available via the underlying AKM correction — stored in `diagnostics['akm_se']`).
`ci`	`tuple`	`(lower, upper)` at `alpha`.
`per_period`	`DataFrame`	Per-period cross-sectional estimates (one row per `time`), useful for event-study-style dynamic effects.
`rotemberg_panel`	`DataFrame`	Rotemberg weights aggregated across periods (industries × stats).
`share_balance`	`DataFrame`
`n_units`	`int`
`n_periods`	`int`
`n_industries`	`int`
`method`	`str`
`diagnostics`	`dict`	Estimator-side diagnostics: `fe` (mode), `cluster`, `akm_se`, `n_obs`, `first_stage_F`.
`model_info`	`dict`	Output-layer metadata: `model_type`, `method`, `fixed_effects` (column-name list as `"unit+time"`), `cluster`. Consumed by :func:`statspai.regtable` to render per-FE / cluster rows automatically.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> units, times, inds = list(range(30)), [0, 1, 2, 3], list("AB")
>>> shares = pd.DataFrame(rng.dirichlet(np.ones(2), size=len(units)),
...                       index=units, columns=inds)
>>> shocks = pd.DataFrame(
...     rng.normal(size=(len(times), 2)), index=times, columns=inds,
... )
>>> rows = []
>>> for i in units:
...     for t in times:
...         b = float((shares.loc[i] * shocks.loc[t]).sum())
...         x = b + rng.normal(scale=0.1)
...         rows.append({"u": i, "t": t, "y": 0.3 * x, "x": x})
>>> df = pd.DataFrame(rows)
>>> res = sp.shift_share_political_panel(
...     df, unit="u", time="t", outcome="y", endog="x",
...     shares=shares, shocks=shocks,
... )
>>> bool(np.isfinite(res.estimate))
True

bartik ¶

bartik(data: DataFrame, y: str, endog: str, shares: DataFrame, shocks: Series, covariates: Optional[List[str]] = None, leave_one_out: bool = True, regional_shocks: Optional[DataFrame] = None, robust: str = 'hc1', alpha: float = 0.05) -> EconometricResults

Estimate using Shift-Share (Bartik) instrumental variables.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Cross-sectional data with one row per region/unit.	required
`y`	`str`	Outcome variable.	required
`endog`	`str`	Endogenous regressor (e.g., local employment growth).	required
`shares`	`DataFrame`	Share matrix (n_units x n_industries). Rows = regions, cols = industries. Row index must align with data index.	required
`shocks`	`Series`	National shock vector (n_industries,). Index = industry names.	required
`covariates`	`list of str`	Exogenous control variables.	`None`
`leave_one_out`	`bool`	Compute leave-one-out shocks (exclude own region from national average). Only takes effect when `regional_shocks` is also supplied — without the per-region industry growth panel there is not enough information to reconstruct `g_k` excluding region `i`. When `leave_one_out=True` but `regional_shocks` is not provided, a `UserWarning` is raised and the estimator falls back to the simple Bartik instrument.	`True`
`regional_shocks`	`DataFrame`	Regional industry growth matrix (n_units x n_industries). Row `i`, column `k` is the realised growth of industry `k` in region `i`. When provided with `leave_one_out=True`, the instrument uses `g_k^{-i} = (sum_j g_{jk} - g_{ik}) / (n - 1)` (Borusyak- Hull-Jaravel 2022-style exact leave-one-out). Row index must align with `shares`; columns must be a superset of `shocks.index`.	`None`
`robust`	`str`	Standard error type.	`'hc1'`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`EconometricResults`	2SLS results with Bartik IV diagnostics.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n, K = 80, 4
>>> # shares: DataFrame (regions x industries), shocks: Series (industries)
>>> shares = pd.DataFrame(
...     rng.dirichlet(np.ones(K), size=n),
...     columns=[f"ind{k}" for k in range(K)],
... )
>>> shocks = pd.Series(rng.normal(0.05, 0.02, K), index=shares.columns)
>>> emp = shares.values @ shocks.values + rng.normal(0, 0.01, n)
>>> wage = 1.0 + 2.0 * emp + rng.normal(0, 0.02, n)
>>> df = pd.DataFrame({"wage_growth": wage, "emp_growth": emp})
>>> result = sp.bartik(df, y='wage_growth', endog='emp_growth',
...                    shares=shares, shocks=shocks, leave_one_out=False)
>>> _ = result.summary()

ssaggregate ¶

ssaggregate(data: DataFrame, y: str, x: str, shares: ndarray, shocks: Union[str, ndarray, Series] = None, shock_data: Optional[DataFrame] = None, controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05) -> EconometricResults

Shift-share IV estimation with AKM (2019) corrected standard errors.

Estimates a 2SLS regression where the instrument is a Bartik (shift-share) variable B_i = sum_k s_{ik} g_k, and corrects the variance-covariance matrix to account for cross-sectional correlation induced by shared shocks.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Observation-level data (n rows).	required
`y`	`str`	Outcome variable name.	required
`x`	`str`	Endogenous regressor / constructed Bartik IV column in data. If the column is the constructed Bartik instrument itself (i.e. the reduced-form specification), the estimator runs OLS and corrects SEs. If it is an endogenous regressor, a Bartik IV is constructed from shares and shocks for the first stage.	required
`shares`	`array-like of shape (n, K)`	Exposure-share matrix. `shares[i, k]` is unit i's exposure to shock k.	required
`shocks`	`str or array-like of shape (K,)`	Shock vector. Either a 1-D array/Series of length K, or a column name in shock_data. If `None`, the Bartik variable in x is used directly (reduced-form mode).	`None`
`shock_data`	`DataFrame`	Shock-level DataFrame (K rows) when shocks is a column name.	`None`
`controls`	`list of str`	Exogenous control variables.	`None`
`cluster`	`str`	Observation-level cluster variable (not used in the AKM variance; retained for compatibility and diagnostics).	`None`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`EconometricResults`	With AKM-corrected standard errors.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n, K = 80, 5
>>> shares = rng.dirichlet(np.ones(K), size=n)  # exposure shares, rows sum to 1
>>> g = rng.normal(0.0, 1.0, K)                 # industry-level shocks
>>> bartik = shares @ g                          # shift-share instrument
>>> emp = 0.7 * bartik + rng.normal(0, 0.5, n)   # endogenous regressor
>>> wage = 1.5 * emp + rng.normal(0, 0.5, n)     # outcome
>>> df = pd.DataFrame({"wage": wage, "emp": emp})
>>> df_shocks = pd.DataFrame({"industry_growth": g})
>>> result = sp.ssaggregate(
...     data=df,
...     y="wage",
...     x="emp",
...     shares=shares,
...     shocks="industry_growth",
...     shock_data=df_shocks,
... )
>>> bool(result.params.shape[0] >= 1)
True

shift_share_se ¶

shift_share_se(iv_result: EconometricResults, shares: ndarray, alpha: float = 0.05) -> EconometricResults

Correct standard errors of an existing IV result for shift-share structure.

Takes an EconometricResults from any StatsPAI IV estimator and replaces the SEs with AKM (2019) shock-clustered SEs.

Parameters:

Name	Type	Description	Default
`iv_result`	`EconometricResults`	An IV estimation result that contains `residuals` and a `fitted_values` in its `data_info`, plus the instrument residualised values (stored by Bartik estimator).	required
`shares`	`array-like of shape (n, K)`	Exposure-share matrix.	required
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`EconometricResults`	A new result object with AKM-corrected standard errors.

Notes

This function requires that the IV result's data_info contains 'residuals'. It computes the AKM variance using the residuals and shares. For the instrument residuals, it uses the fitted values from the first stage (fitted_values).

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n, K = 80, 5
>>> S = pd.DataFrame(rng.dirichlet(np.ones(K), size=n),
...                  columns=[f"ind{k}" for k in range(K)])
>>> g = pd.Series(rng.normal(0, 1, K), index=S.columns)
>>> bartik = S.values @ g.values
>>> emp = 0.7 * bartik + rng.normal(0, 0.5, n)
>>> wage = 1.5 * emp + rng.normal(0, 0.5, n)
>>> df = pd.DataFrame({"wage": wage, "emp": emp})
>>> iv_res = sp.bartik(df, y='wage', endog='emp', shares=S, shocks=g,
...                    leave_one_out=False)
>>> corrected = sp.shift_share_se(iv_res, shares=S.values)
>>> bool("SE (AKM)" in corrected.diagnostics)
True

shift_share_political ¶

shift_share_political(data: DataFrame, *, unit: str, time: str, outcome: str, endog: str, shares: DataFrame, shocks: Series, covariates: Optional[Sequence[str]] = None, leave_one_out: bool = True, alpha: float = 0.05) -> ShiftSharePoliticalResult

Park-Xu (2026) shift-share IV for political-science panel data.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame (long format)`	Unit × time panel containing `outcome` and `endog`. First and last periods per unit are used to form long-differences.	required
`unit`	`str`	Column names.	required
`time`	`str`	Column names.	required
`outcome`	`str`	Column names.	required
`endog`	`str`	Column names.	required
`shares`	`DataFrame (unit × industry)`	Exposure-share matrix. Row index must equal the unit IDs.	required
`shocks`	`Series (industry → scalar)`	National / supra-unit shifter vector. Index must match the columns of `shares`.	required
`covariates`	`sequence of str`	Pre-treatment covariates (measured at the first period per unit) used for the share-balance diagnostic.	`None`
`leave_one_out`	`bool`	Forwarded to :func:`sp.bartik`.	`True`
`alpha`	`bool`	Forwarded to :func:`sp.bartik`.	`True`

Returns:

Type	Description
`ShiftSharePoliticalResult`

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> units = range(20); times = range(2); inds = [f'I{k}' for k in range(5)]
>>> shares = pd.DataFrame(rng.dirichlet(np.ones(5), size=len(units)),
...                       index=list(units), columns=inds)
>>> shocks = pd.Series(rng.normal(size=5), index=inds)
>>> rows = []
>>> true_tau = 0.4
>>> for i in units:
...     bartik_i = float((shares.loc[i] * shocks).sum())
...     dx = bartik_i + rng.normal(scale=0.1)
...     y_first = 0.0
...     y_last = y_first + true_tau * dx + rng.normal(scale=0.1)
...     rows.append({'unit': i, 'time': 0, 'y': y_first, 'x': 0.0})
...     rows.append({'unit': i, 'time': 1, 'y': y_last, 'x': dx})
>>> df = pd.DataFrame(rows)
>>> out = sp.shift_share_political(
...     df, unit='unit', time='time',
...     outcome='y', endog='x',
...     shares=shares, shocks=shocks,
... )
>>> abs(out.estimate - true_tau) < 0.3
True

shift_share_political_panel ¶

shift_share_political_panel(data: DataFrame, *, unit: str, time: str, outcome: str, endog: str, shares: Any, shocks: Any, covariates: Optional[Sequence[str]] = None, cluster: str = 'unit', alpha: float = 0.05, fe: str = 'two-way') -> ShiftSharePoliticalPanelResult

Multi-period panel shift-share IV (Park-Xu 2026 §4.2).

Pooled 2SLS with unit / time / two-way fixed effects, using the period-specific Bartik instrument

Z_{it} = sum_k s_{ikt} · g_{kt}

The share matrix can be time-invariant (DataFrame indexed by unit) or time-varying (dict[time → DataFrame]); the shock vector can similarly be scalar-in-time (Series) or time-varying (DataFrame indexed by time / dict[time → Series]).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame (long format)`		required
`unit`	`str`		required
`time`	`str`		required
`outcome`	`str`		required
`endog`	`str`		required
`shares`	`DataFrame or dict[time → DataFrame]`		required
`shocks`	`Series, DataFrame(time × industry), or dict[time → Series]`		required
`covariates`	`sequence of str`	Time-varying controls.	`None`
`cluster`	`('unit', 'time', 'twoway')`	Cluster structure for the panel SE.	`'unit'`
`alpha`	`float`		`0.05`
`fe`	`('two-way', 'unit', 'time', 'none')`	Fixed-effect structure. `'two-way'` is the Park-Xu default.	`'two-way'`

Returns:

Type	Description
`ShiftSharePoliticalPanelResult`

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> units, times, inds = list(range(30)), [0, 1, 2, 3], list("AB")
>>> shares = pd.DataFrame(rng.dirichlet(np.ones(2), size=len(units)),
...                       index=units, columns=inds)
>>> shocks = pd.DataFrame(
...     rng.normal(size=(len(times), 2)), index=times, columns=inds,
... )
>>> rows = []
>>> tau = 0.3
>>> for i in units:
...     for t in times:
...         b = float((shares.loc[i] * shocks.loc[t]).sum())
...         x = b + rng.normal(scale=0.1)
...         y = tau * x + rng.normal(scale=0.1)
...         rows.append({'u': i, 't': t, 'y': y, 'x': x})
>>> df = pd.DataFrame(rows)
>>> out = sp.shift_share_political_panel(
...     df, unit='u', time='t', outcome='y', endog='x',
...     shares=shares, shocks=shocks,
... )
>>> abs(out.estimate - tau) < 0.15
True

statspai.bartik¶

bartik ¶

BartikIV ¶

rotemberg_weights property ¶

fit ¶

ShiftSharePoliticalResult dataclass ¶

ShiftSharePoliticalPanelResult dataclass ¶

bartik ¶

ssaggregate ¶

shift_share_se ¶

shift_share_political ¶

shift_share_political_panel ¶

`statspai.bartik`¶

rotemberg_weights `property` ¶

ShiftSharePoliticalResult `dataclass` ¶

ShiftSharePoliticalPanelResult `dataclass` ¶