Skip to content

statspai.bartik

bartik

Shift-Share (Bartik) Instrumental Variables for StatsPAI.

Constructs Bartik instruments from industry shares and national shocks, with diagnostics for instrument validity following Goldsmith-Pinkham, Sorkin, and Swift (2020) and Borusyak, Hull, and Jaravel (2022).

References

Goldsmith-Pinkham, P., Sorkin, I., and Swift, H. (2020). "Bartik Instruments: What, When, Why, and How." American Economic Review, 110(8), 2586-2624. [@goldsmithpinkham2020bartik]

Borusyak, K., Hull, P., and Jaravel, X. (2022). "Quasi-Experimental Shift-Share Research Designs." Review of Economic Studies, 89(1), 181-213. [@borusyak2022quasi]

BartikIV

Bartik Shift-Share IV estimator.

rotemberg_weights property

rotemberg_weights: DataFrame

Rotemberg weight decomposition by industry.

fit

Fit Bartik IV via 2SLS.

ShiftSharePoliticalResult dataclass

Structured output of :func:shift_share_political.

Wraps a standard :class:CausalResult (point + SEs) plus the two Park-Xu (2026) diagnostics: share-balance and Rotemberg top-K.

ShiftSharePoliticalPanelResult dataclass

Structured output of :func:shift_share_political_panel.

Attributes:

Name Type Description
estimate float

Pooled 2SLS coefficient on endog.

se float

Panel-clustered SE (unit by default; shock-clustered available via the underlying AKM correction — stored in diagnostics['akm_se']).

ci tuple

(lower, upper) at alpha.

per_period DataFrame

Per-period cross-sectional estimates (one row per time), useful for event-study-style dynamic effects.

rotemberg_panel DataFrame

Rotemberg weights aggregated across periods (industries × stats).

share_balance DataFrame
n_units int
n_periods int
n_industries int
method str
diagnostics dict

Estimator-side diagnostics: fe (mode), cluster, akm_se, n_obs, first_stage_F.

model_info dict

Output-layer metadata: model_type, method, fixed_effects (column-name list as "unit+time"), cluster. Consumed by :func:statspai.regtable to render per-FE / cluster rows automatically.

bartik

bartik(data: DataFrame, y: str, endog: str, shares: DataFrame, shocks: Series, covariates: Optional[List[str]] = None, leave_one_out: bool = True, regional_shocks: Optional[DataFrame] = None, robust: str = 'hc1', alpha: float = 0.05) -> EconometricResults

Estimate using Shift-Share (Bartik) instrumental variables.

Parameters:

Name Type Description Default
data DataFrame

Cross-sectional data with one row per region/unit.

required
y str

Outcome variable.

required
endog str

Endogenous regressor (e.g., local employment growth).

required
shares DataFrame

Share matrix (n_units x n_industries). Rows = regions, cols = industries. Row index must align with data index.

required
shocks Series

National shock vector (n_industries,). Index = industry names.

required
covariates list of str

Exogenous control variables.

None
leave_one_out bool

Compute leave-one-out shocks (exclude own region from national average). Only takes effect when regional_shocks is also supplied — without the per-region industry growth panel there is not enough information to reconstruct g_k excluding region i. When leave_one_out=True but regional_shocks is not provided, a UserWarning is raised and the estimator falls back to the simple Bartik instrument.

True
regional_shocks DataFrame

Regional industry growth matrix (n_units x n_industries). Row i, column k is the realised growth of industry k in region i. When provided with leave_one_out=True, the instrument uses g_k^{-i} = (sum_j g_{jk} - g_{ik}) / (n - 1) (Borusyak- Hull-Jaravel 2022-style exact leave-one-out). Row index must align with shares; columns must be a superset of shocks.index.

None
robust str

Standard error type.

'hc1'
alpha float

Significance level.

0.05

Returns:

Type Description
EconometricResults

2SLS results with Bartik IV diagnostics.

Examples:

>>> # shares: DataFrame (regions x industries), shocks: Series (industries)
>>> result = bartik(df, y='wage_growth', endog='emp_growth',
...                shares=share_matrix, shocks=national_growth)
>>> print(result.summary())

ssaggregate

ssaggregate(data: DataFrame, y: str, x: str, shares: ndarray, shocks: Union[str, ndarray, Series] = None, shock_data: Optional[DataFrame] = None, controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05) -> EconometricResults

Shift-share IV estimation with AKM (2019) corrected standard errors.

Estimates a 2SLS regression where the instrument is a Bartik (shift-share) variable B_i = sum_k s_{ik} g_k, and corrects the variance-covariance matrix to account for cross-sectional correlation induced by shared shocks.

Parameters:

Name Type Description Default
data DataFrame

Observation-level data (n rows).

required
y str

Outcome variable name.

required
x str

Endogenous regressor / constructed Bartik IV column in data. If the column is the constructed Bartik instrument itself (i.e. the reduced-form specification), the estimator runs OLS and corrects SEs. If it is an endogenous regressor, a Bartik IV is constructed from shares and shocks for the first stage.

required
shares array-like of shape (n, K)

Exposure-share matrix. shares[i, k] is unit i's exposure to shock k.

required
shocks str or array-like of shape (K,)

Shock vector. Either a 1-D array/Series of length K, or a column name in shock_data. If None, the Bartik variable in x is used directly (reduced-form mode).

None
shock_data DataFrame

Shock-level DataFrame (K rows) when shocks is a column name.

None
controls list of str

Exogenous control variables.

None
cluster str

Observation-level cluster variable (not used in the AKM variance; retained for compatibility and diagnostics).

None
alpha float

Significance level.

0.05

Returns:

Type Description
EconometricResults

With AKM-corrected standard errors.

Examples:

>>> result = sp.ssaggregate(
...     data=df,
...     y="employment_growth",
...     x="bartik_instrument",
...     shares=shares_matrix,
...     shocks="industry_growth",
...     shock_data=df_shocks,
...     controls=["population", "density"],
... )
>>> print(result.summary())

shift_share_se

shift_share_se(iv_result: EconometricResults, shares: ndarray, alpha: float = 0.05) -> EconometricResults

Correct standard errors of an existing IV result for shift-share structure.

Takes an EconometricResults from any StatsPAI IV estimator and replaces the SEs with AKM (2019) shock-clustered SEs.

Parameters:

Name Type Description Default
iv_result EconometricResults

An IV estimation result that contains residuals and a fitted_values in its data_info, plus the instrument residualised values (stored by Bartik estimator).

required
shares array-like of shape (n, K)

Exposure-share matrix.

required
alpha float

Significance level.

0.05

Returns:

Type Description
EconometricResults

A new result object with AKM-corrected standard errors.

Notes

This function requires that the IV result's data_info contains 'residuals'. It computes the AKM variance using the residuals and shares. For the instrument residuals, it uses the fitted values from the first stage (fitted_values).

Examples:

>>> iv_res = sp.bartik(df, y='wage', endog='emp',
...                    shares=S, shocks=g)
>>> corrected = sp.shift_share_se(iv_res, shares=S)
>>> print(corrected.summary())

shift_share_political

shift_share_political(data: DataFrame, *, unit: str, time: str, outcome: str, endog: str, shares: DataFrame, shocks: Series, covariates: Optional[Sequence[str]] = None, leave_one_out: bool = True, alpha: float = 0.05) -> ShiftSharePoliticalResult

Park-Xu (2026) shift-share IV for political-science panel data.

Parameters:

Name Type Description Default
data DataFrame (long format)

Unit × time panel containing outcome and endog. First and last periods per unit are used to form long-differences.

required
unit str

Column names.

required
time str

Column names.

required
outcome str

Column names.

required
endog str

Column names.

required
shares DataFrame (unit × industry)

Exposure-share matrix. Row index must equal the unit IDs.

required
shocks Series (industry → scalar)

National / supra-unit shifter vector. Index must match the columns of shares.

required
covariates sequence of str

Pre-treatment covariates (measured at the first period per unit) used for the share-balance diagnostic.

None
leave_one_out bool

Forwarded to :func:sp.bartik.

True
alpha bool

Forwarded to :func:sp.bartik.

True

Returns:

Type Description
ShiftSharePoliticalResult

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> units = range(20); times = range(2); inds = [f'I{k}' for k in range(5)]
>>> shares = pd.DataFrame(rng.dirichlet(np.ones(5), size=len(units)),
...                       index=list(units), columns=inds)
>>> shocks = pd.Series(rng.normal(size=5), index=inds)
>>> rows = []
>>> true_tau = 0.4
>>> for i in units:
...     bartik_i = float((shares.loc[i] * shocks).sum())
...     dx = bartik_i + rng.normal(scale=0.1)
...     y_first = 0.0
...     y_last = y_first + true_tau * dx + rng.normal(scale=0.1)
...     rows.append({'unit': i, 'time': 0, 'y': y_first, 'x': 0.0})
...     rows.append({'unit': i, 'time': 1, 'y': y_last, 'x': dx})
>>> df = pd.DataFrame(rows)
>>> out = sp.bartik.shift_share_political(
...     df, unit='unit', time='time',
...     outcome='y', endog='x',
...     shares=shares, shocks=shocks,
... )
>>> abs(out.estimate - true_tau) < 0.3
True

shift_share_political_panel

shift_share_political_panel(data: DataFrame, *, unit: str, time: str, outcome: str, endog: str, shares: Any, shocks: Any, covariates: Optional[Sequence[str]] = None, cluster: str = 'unit', alpha: float = 0.05, fe: str = 'two-way') -> ShiftSharePoliticalPanelResult

Multi-period panel shift-share IV (Park-Xu 2026 §4.2).

Pooled 2SLS with unit / time / two-way fixed effects, using the period-specific Bartik instrument

Z_{it} = sum_k s_{ikt} · g_{kt}

The share matrix can be time-invariant (DataFrame indexed by unit) or time-varying (dict[time → DataFrame]); the shock vector can similarly be scalar-in-time (Series) or time-varying (DataFrame indexed by time / dict[time → Series]).

Parameters:

Name Type Description Default
data DataFrame (long format)
required
unit str
required
time str
required
outcome str
required
endog str
required
shares DataFrame or dict[time → DataFrame]
required
shocks Series, DataFrame(time × industry), or dict[time → Series]
required
covariates sequence of str

Time-varying controls.

None
cluster ('unit', 'time', 'twoway')

Cluster structure for the panel SE.

'unit'
alpha float
0.05
fe ('two-way', 'unit', 'time', 'none')

Fixed-effect structure. 'two-way' is the Park-Xu default.

'two-way'

Returns:

Type Description
ShiftSharePoliticalPanelResult

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> units, times, inds = list(range(30)), [0, 1, 2, 3], list("AB")
>>> shares = pd.DataFrame(rng.dirichlet(np.ones(2), size=len(units)),
...                       index=units, columns=inds)
>>> shocks = pd.DataFrame(
...     rng.normal(size=(len(times), 2)), index=times, columns=inds,
... )
>>> rows = []
>>> tau = 0.3
>>> for i in units:
...     for t in times:
...         b = float((shares.loc[i] * shocks.loc[t]).sum())
...         x = b + rng.normal(scale=0.1)
...         y = tau * x + rng.normal(scale=0.1)
...         rows.append({'u': i, 't': t, 'y': y, 'x': x})
>>> df = pd.DataFrame(rows)
>>> out = sp.shift_share_political_panel(
...     df, unit='u', time='t', outcome='y', endog='x',
...     shares=shares, shocks=shocks,
... )
>>> abs(out.estimate - tau) < 0.15
True