Skip to content

statspai.structural

structural

Structural estimation methods.

BLPResult

Results from BLP demand estimation.

Attributes:

Name Type Description
linear_params Series

Linear parameter estimates (β, α).

nonlinear_params Series

Nonlinear parameter estimates (σ, random coefficient std devs).

se_linear Series

Standard errors for linear parameters.

se_nonlinear Series

Standard errors for nonlinear parameters.

mean_utility Series

Estimated mean utility δ for each product-market.

own_elasticities Series

Own-price elasticities for each product-market.

n_markets int

Number of markets.

n_products int

Total number of product-market observations.

gmm_objective float

Value of the GMM objective at the optimum.

converged bool

Whether the outer-loop optimization converged.

summary

summary() -> str

Return a formatted summary of BLP estimation results.

elasticity_matrix

elasticity_matrix(market_id=None) -> DataFrame

Return the full own- and cross-price elasticity matrix for a market.

Parameters:

Name Type Description Default
market_id hashable

Market to return. If None, returns the first market.

None

Returns:

Type Description
DataFrame

(J_m x J_m) elasticity matrix with product labels.

diversion_ratios

diversion_ratios(market_id=None) -> DataFrame

Compute diversion ratios for a given market.

Diversion ratio D_{jk} = fraction of consumers leaving product j that switch to product k (rather than the outside option or other products). D_{jk} = (ds_k/dp_j) / (-ds_j/dp_j).

With logit-type models this simplifies to cross-elasticity ratios adjusted by shares.

Parameters:

Name Type Description Default
market_id hashable

Market to compute for. If None, uses the first market.

None

Returns:

Type Description
DataFrame

to_econometric_results

to_econometric_results() -> EconometricResults

Convert to a standard EconometricResults object.

ProductionResult

Bases: EconometricResults

Result object for production function estimation.

Inherits params / std_errors / summary / to_dict from :class:EconometricResults and adds production-function-specific payload:

  • coef — input elasticities keyed by input name (e.g. {"l": 0.62, "k": 0.31})
  • tfp — firm-time TFP estimates omega_it (in logs); same length as the post-stage-2 working sample
  • residuals — i.i.d. shock eta_it from stage 1
  • productivity_process{"rho": float, "sigma": float} from the AR fit on omega
  • markup — placeholder; populated by :func:statspai.markup

Use .summary() for a Stata-style table or .coef for the raw dict.

cite

cite() -> str

Return the canonical reference string for self.method.

prod_fn

prod_fn(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str | None = None, panel_id: str = 'id', time: str = 'year', method: str = 'acf', polynomial_degree: int = 3, productivity_degree: int = 1, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None, **kwargs) -> ProductionResult

Production function estimation — unified interface.

Parameters:

Name Type Description Default
data DataFrame

Long panel with one row per (firm, year).

required
output str

Log output column.

'y'
free str or list

Free inputs (e.g. labor).

``["l"]``
state str or list

State / predetermined inputs (capital).

``["k"]``
proxy str

Productivity proxy. Defaults: "i" for method="op", "m" for all others.

None
panel_id str

Panel identifier columns.

'id'
time str

Panel identifier columns.

'id'
method ('op', 'lp', 'acf', 'wrdg')

Estimator. ACF is the modern default (corrects OP/LP identification problem).

'op'
polynomial_degree int

Stage-1 control function polynomial degree.

3
productivity_degree int

Productivity AR polynomial degree. Default 1 (linear AR(1)) is the most numerically robust choice — higher degrees can overfit omega_t given omega_{t-1} in finite samples and flatten the GMM objective surface, which makes the structural parameters numerically un-identified even when they are identified in population.

1
functional_form ('cobb-douglas', 'translog')

Functional form. Translog adds 0.5 * x_j*2 own-quadratic terms and x_jx_k cross terms — output elasticities then vary by firm-time and ProductionResult.model_info["elasticities"] carries a per-row DataFrame.

Translog identification caveat: stage-2 instruments are formed as polynomial transforms of the same raw set used for Cobb-Douglas ((k, l_lag) for ACF, (k, l) for OP/LP). This is standard in the literature but the resulting moment system can be near-singular when state and lagged-free inputs are highly correlated, so finite-sample variance on the higher-order coefficients (ll, kk, lk) is substantially larger than on the linear l and k terms. Use bootstrap SEs (boot_reps>=200) to gauge this.

Wooldridge does not yet support translog (raises NotImplementedError); use method="acf" or method="lp" for translog work.

'cobb-douglas'
boot_reps int

Firm-cluster bootstrap replications. 0 ⇒ NaN standard errors.

0
seed int
None

Returns:

Type Description
ProductionResult

.coef for elasticities, .tfp for log-productivity series, .summary() for a Stata-style table.

Examples:

>>> import statspai as sp
>>> res = sp.prod_fn(df, output="y", free="l", state="k", proxy="m",
...                   panel_id="id", time="year",
...                   method="acf", boot_reps=200, seed=0)
>>> res.coef
{"l": 0.62, "k": 0.32}
>>> mu = sp.markup(res, revenue="log_rev", input_cost="log_mat",
...                 flexible_input="m")
See Also

olley_pakes, levinsohn_petrin, ackerberg_caves_frazer, wooldridge_prod markup : De Loecker-Warzynski (2012) firm-time markup.

References

Olley & Pakes (1996); Levinsohn & Petrin (2003); Ackerberg, Caves & Frazer (2015); Wooldridge (2009).

olley_pakes

olley_pakes(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str = 'i', panel_id: str = 'id', time: str = 'year', polynomial_degree: int = 3, productivity_degree: int = 1, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None, drop_zero_proxy: bool = True) -> ProductionResult

Olley-Pakes (1996) production function estimator.

Uses investment as the proxy for unobserved productivity. Firms with zero investment are dropped by default — the inversion of the investment policy requires a strictly positive proxy.

Parameters:

Name Type Description Default
data DataFrame

Long-form panel: one row per (firm, year).

required
output str

Log output column.

``"y"``
free str or list

Freely chosen inputs (e.g. labor). Multiple are allowed.

``["l"]``
state str or list

State inputs (capital, predetermined).

``["k"]``
proxy str

Investment column (must be > 0 to invert).

``"i"``
panel_id str

Firm and year identifiers.

'id'
time str

Firm and year identifiers.

'id'
polynomial_degree int

Degree of the stage-1 polynomial in (free, state, proxy).

3
productivity_degree int

Degree of the polynomial g in the AR productivity process.

3
functional_form ('cobb-douglas', 'translog')

Production function form. Translog adds quadratic and cross terms; ProductionResult.model_info["elasticities"] then carries firm-time output elasticities.

'cobb-douglas'
boot_reps int

Firm-cluster bootstrap replications. 0 ⇒ NaN standard errors.

0
seed int

Bootstrap RNG seed.

None
drop_zero_proxy bool

Drop rows with non-positive proxy (required by the OP inversion). Note that dropping period t for a firm also forfeits period t+1 for that firm in stage 2 (lag operator now has no predecessor), so firms with sporadic zero-investment years lose more observations than the raw drop count.

True

Returns:

Type Description
ProductionResult

Examples:

>>> import statspai as sp
>>> res = sp.olley_pakes(df, output="y", free="l", state="k",
...                       proxy="i", panel_id="id", time="year",
...                       boot_reps=200, seed=0)
>>> res.coef           # {"l": 0.62, "k": 0.31}
>>> res.summary()
References

Olley, G.S. & Pakes, A. (1996). The dynamics of productivity in the telecommunications equipment industry. Econometrica, 64(6), 1263-1297.

levinsohn_petrin

levinsohn_petrin(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str = 'm', panel_id: str = 'id', time: str = 'year', polynomial_degree: int = 3, productivity_degree: int = 1, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None) -> ProductionResult

Levinsohn-Petrin (2003) production function estimator.

Uses intermediate input (materials / energy) as the productivity proxy. Avoids the OP zero-investment selection problem because most firms use materials in every period.

Parameters:

Name Type Description Default
data DataFrame

Long-form panel.

required
output str

Log output.

'y'
free str or list

Free inputs.

``["l"]``
state str or list

State inputs.

``["k"]``
proxy str

Intermediate input (materials).

``"m"``
panel_id str
'id'
time str
'id'
polynomial_degree int
3
productivity_degree int
3
functional_form ('cobb-douglas', 'translog')
'cobb-douglas'
boot_reps int
0
seed int
None

Returns:

Type Description
ProductionResult
References

Levinsohn, J. & Petrin, A. (2003). Estimating production functions using inputs to control for unobservables. Review of Economic Studies, 70(2), 317-341.

ackerberg_caves_frazer

ackerberg_caves_frazer(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str = 'm', panel_id: str = 'id', time: str = 'year', polynomial_degree: int = 3, productivity_degree: int = 1, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None) -> ProductionResult

Ackerberg-Caves-Frazer (2015) production function estimator.

Corrects the OP / LP "functional dependence" identification problem: when free inputs (labor) are chosen at the same time as the proxy, the labor coefficient is not identified in the stage-1 polynomial. ACF moves all coefficient identification to stage 2, instrumenting free inputs with their lagged values and state inputs at the contemporaneous level.

Parameters:

Name Type Description Default
data DataFrame
required
output str
'y'
free str or list

Free inputs — instrumented with their lag in stage 2.

``["l"]``
state str or list
``["k"]``
proxy str

Intermediate input (materials).

``"m"``
panel_id str
'id'
time str
'id'
polynomial_degree int
3
productivity_degree int
3
functional_form ('cobb-douglas', 'translog')
'cobb-douglas'
boot_reps int
0
seed int
None

Returns:

Type Description
ProductionResult
Notes

Requires at least two consecutive time periods per firm so that lagged labor exists.

References

Ackerberg, D.A., Caves, K. & Frazer, G. (2015). Identification properties of recent production function estimators. Econometrica, 83(6), 2411-2451.

wooldridge_prod

wooldridge_prod(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str = 'm', panel_id: str = 'id', time: str = 'year', polynomial_degree: int = 2, productivity_degree: int = 2, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None) -> ProductionResult

Wooldridge (2009) joint production function estimator (stacked NLS).

Estimates (beta_l, beta_k) jointly with the nonparametric control function h(m, k) and productivity Markov polynomial g(omega_{t-1}) by minimizing the sum of squared residuals over a stacked level + productivity-substituted equation system. This is equivalent to one-step GMM with identity weight matrix and instruments equal to the regressors (NLS). A full GMM version with optimal weighting is on the roadmap.

Parameters:

Name Type Description Default
data DataFrame

Long panel with one row per (firm, year).

required
output str

Log output column.

``"y"``
free str or list

Free inputs (labor).

``["l"]``
state str or list

State inputs (capital).

``["k"]``
proxy str

Productivity proxy (typically intermediate input).

``"m"``
panel_id str

Panel identifiers.

'id'
time str

Panel identifiers.

'id'
polynomial_degree int

Degree of h(m, k). Smaller than OP/LP/ACF default because the joint problem is higher-dimensional.

2
productivity_degree int

Degree of the AR polynomial g(omega_{t-1}).

2
boot_reps int

Firm-cluster bootstrap replications. 0 ⇒ NaN standard errors.

0
seed int
None

Returns:

Type Description
ProductionResult
References

Wooldridge, J.M. (2009). On estimating firm-level production functions using proxy variables to control for unobservables. Economics Letters, 104(3), 112-114.