statspai.bartik¶
bartik ¶
Shift-Share (Bartik) Instrumental Variables for StatsPAI.
Constructs Bartik instruments from industry shares and national shocks, with diagnostics for instrument validity following Goldsmith-Pinkham, Sorkin, and Swift (2020) and Borusyak, Hull, and Jaravel (2022).
References
Goldsmith-Pinkham, P., Sorkin, I., and Swift, H. (2020). "Bartik Instruments: What, When, Why, and How." American Economic Review, 110(8), 2586-2624. [@goldsmithpinkham2020bartik]
Borusyak, K., Hull, P., and Jaravel, X. (2022). "Quasi-Experimental Shift-Share Research Designs." Review of Economic Studies, 89(1), 181-213. [@borusyak2022quasi]
BartikIV ¶
Bartik Shift-Share IV estimator.
rotemberg_weights
property
¶
Rotemberg weight decomposition by industry.
ShiftSharePoliticalResult
dataclass
¶
Structured output of :func:shift_share_political.
Wraps a standard :class:CausalResult (point + SEs) plus the two
Park-Xu (2026) diagnostics: share-balance and Rotemberg top-K.
ShiftSharePoliticalPanelResult
dataclass
¶
Structured output of :func:shift_share_political_panel.
Attributes:
| Name | Type | Description |
|---|---|---|
estimate |
float
|
Pooled 2SLS coefficient on |
se |
float
|
Panel-clustered SE (unit by default; shock-clustered available
via the underlying AKM correction — stored in
|
ci |
tuple
|
|
per_period |
DataFrame
|
Per-period cross-sectional estimates (one row per |
rotemberg_panel |
DataFrame
|
Rotemberg weights aggregated across periods (industries × stats). |
share_balance |
DataFrame
|
|
n_units |
int
|
|
n_periods |
int
|
|
n_industries |
int
|
|
method |
str
|
|
diagnostics |
dict
|
Estimator-side diagnostics: |
model_info |
dict
|
Output-layer metadata: |
bartik ¶
bartik(data: DataFrame, y: str, endog: str, shares: DataFrame, shocks: Series, covariates: Optional[List[str]] = None, leave_one_out: bool = True, regional_shocks: Optional[DataFrame] = None, robust: str = 'hc1', alpha: float = 0.05) -> EconometricResults
Estimate using Shift-Share (Bartik) instrumental variables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Cross-sectional data with one row per region/unit. |
required |
y
|
str
|
Outcome variable. |
required |
endog
|
str
|
Endogenous regressor (e.g., local employment growth). |
required |
shares
|
DataFrame
|
Share matrix (n_units x n_industries). Rows = regions, cols = industries. Row index must align with data index. |
required |
shocks
|
Series
|
National shock vector (n_industries,). Index = industry names. |
required |
covariates
|
list of str
|
Exogenous control variables. |
None
|
leave_one_out
|
bool
|
Compute leave-one-out shocks (exclude own region from national
average). Only takes effect when |
True
|
regional_shocks
|
DataFrame
|
Regional industry growth matrix (n_units x n_industries). Row
|
None
|
robust
|
str
|
Standard error type. |
'hc1'
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
2SLS results with Bartik IV diagnostics. |
Examples:
ssaggregate ¶
ssaggregate(data: DataFrame, y: str, x: str, shares: ndarray, shocks: Union[str, ndarray, Series] = None, shock_data: Optional[DataFrame] = None, controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05) -> EconometricResults
Shift-share IV estimation with AKM (2019) corrected standard errors.
Estimates a 2SLS regression where the instrument is a Bartik (shift-share) variable B_i = sum_k s_{ik} g_k, and corrects the variance-covariance matrix to account for cross-sectional correlation induced by shared shocks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Observation-level data (n rows). |
required |
y
|
str
|
Outcome variable name. |
required |
x
|
str
|
Endogenous regressor / constructed Bartik IV column in data. If the column is the constructed Bartik instrument itself (i.e. the reduced-form specification), the estimator runs OLS and corrects SEs. If it is an endogenous regressor, a Bartik IV is constructed from shares and shocks for the first stage. |
required |
shares
|
array-like of shape (n, K)
|
Exposure-share matrix. |
required |
shocks
|
str or array-like of shape (K,)
|
Shock vector. Either a 1-D array/Series of length K, or a
column name in shock_data. If |
None
|
shock_data
|
DataFrame
|
Shock-level DataFrame (K rows) when shocks is a column name. |
None
|
controls
|
list of str
|
Exogenous control variables. |
None
|
cluster
|
str
|
Observation-level cluster variable (not used in the AKM variance; retained for compatibility and diagnostics). |
None
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
With AKM-corrected standard errors. |
Examples:
shift_share_se ¶
shift_share_se(iv_result: EconometricResults, shares: ndarray, alpha: float = 0.05) -> EconometricResults
Correct standard errors of an existing IV result for shift-share structure.
Takes an EconometricResults from any StatsPAI IV estimator and
replaces the SEs with AKM (2019) shock-clustered SEs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
iv_result
|
EconometricResults
|
An IV estimation result that contains |
required |
shares
|
array-like of shape (n, K)
|
Exposure-share matrix. |
required |
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
A new result object with AKM-corrected standard errors. |
Notes
This function requires that the IV result's data_info contains
'residuals'. It computes the AKM variance using the residuals
and shares. For the instrument residuals, it uses the fitted
values from the first stage (fitted_values).
Examples:
shift_share_political ¶
shift_share_political(data: DataFrame, *, unit: str, time: str, outcome: str, endog: str, shares: DataFrame, shocks: Series, covariates: Optional[Sequence[str]] = None, leave_one_out: bool = True, alpha: float = 0.05) -> ShiftSharePoliticalResult
Park-Xu (2026) shift-share IV for political-science panel data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame (long format)
|
Unit × time panel containing |
required |
unit
|
str
|
Column names. |
required |
time
|
str
|
Column names. |
required |
outcome
|
str
|
Column names. |
required |
endog
|
str
|
Column names. |
required |
shares
|
DataFrame (unit × industry)
|
Exposure-share matrix. Row index must equal the unit IDs. |
required |
shocks
|
Series (industry → scalar)
|
National / supra-unit shifter vector. Index must match the
columns of |
required |
covariates
|
sequence of str
|
Pre-treatment covariates (measured at the first period per unit) used for the share-balance diagnostic. |
None
|
leave_one_out
|
bool
|
Forwarded to :func: |
True
|
alpha
|
bool
|
Forwarded to :func: |
True
|
Returns:
| Type | Description |
|---|---|
ShiftSharePoliticalResult
|
|
Examples:
>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> units = range(20); times = range(2); inds = [f'I{k}' for k in range(5)]
>>> shares = pd.DataFrame(rng.dirichlet(np.ones(5), size=len(units)),
... index=list(units), columns=inds)
>>> shocks = pd.Series(rng.normal(size=5), index=inds)
>>> rows = []
>>> true_tau = 0.4
>>> for i in units:
... bartik_i = float((shares.loc[i] * shocks).sum())
... dx = bartik_i + rng.normal(scale=0.1)
... y_first = 0.0
... y_last = y_first + true_tau * dx + rng.normal(scale=0.1)
... rows.append({'unit': i, 'time': 0, 'y': y_first, 'x': 0.0})
... rows.append({'unit': i, 'time': 1, 'y': y_last, 'x': dx})
>>> df = pd.DataFrame(rows)
>>> out = sp.bartik.shift_share_political(
... df, unit='unit', time='time',
... outcome='y', endog='x',
... shares=shares, shocks=shocks,
... )
>>> abs(out.estimate - true_tau) < 0.3
True
shift_share_political_panel ¶
shift_share_political_panel(data: DataFrame, *, unit: str, time: str, outcome: str, endog: str, shares: Any, shocks: Any, covariates: Optional[Sequence[str]] = None, cluster: str = 'unit', alpha: float = 0.05, fe: str = 'two-way') -> ShiftSharePoliticalPanelResult
Multi-period panel shift-share IV (Park-Xu 2026 §4.2).
Pooled 2SLS with unit / time / two-way fixed effects, using the period-specific Bartik instrument
Z_{it} = sum_k s_{ikt} · g_{kt}
The share matrix can be time-invariant (DataFrame indexed by
unit) or time-varying (dict[time → DataFrame]); the shock
vector can similarly be scalar-in-time (Series) or time-varying
(DataFrame indexed by time / dict[time → Series]).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame (long format)
|
|
required |
unit
|
str
|
|
required |
time
|
str
|
|
required |
outcome
|
str
|
|
required |
endog
|
str
|
|
required |
shares
|
DataFrame or dict[time → DataFrame]
|
|
required |
shocks
|
Series, DataFrame(time × industry), or dict[time → Series]
|
|
required |
covariates
|
sequence of str
|
Time-varying controls. |
None
|
cluster
|
('unit', 'time', 'twoway')
|
Cluster structure for the panel SE. |
'unit'
|
alpha
|
float
|
|
0.05
|
fe
|
('two-way', 'unit', 'time', 'none')
|
Fixed-effect structure. |
'two-way'
|
Returns:
| Type | Description |
|---|---|
ShiftSharePoliticalPanelResult
|
|
Examples:
>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> units, times, inds = list(range(30)), [0, 1, 2, 3], list("AB")
>>> shares = pd.DataFrame(rng.dirichlet(np.ones(2), size=len(units)),
... index=units, columns=inds)
>>> shocks = pd.DataFrame(
... rng.normal(size=(len(times), 2)), index=times, columns=inds,
... )
>>> rows = []
>>> tau = 0.3
>>> for i in units:
... for t in times:
... b = float((shares.loc[i] * shocks.loc[t]).sum())
... x = b + rng.normal(scale=0.1)
... y = tau * x + rng.normal(scale=0.1)
... rows.append({'u': i, 't': t, 'y': y, 'x': x})
>>> df = pd.DataFrame(rows)
>>> out = sp.shift_share_political_panel(
... df, unit='u', time='t', outcome='y', endog='x',
... shares=shares, shocks=shocks,
... )
>>> abs(out.estimate - tau) < 0.15
True