`statspai.structural`¶

structural ¶

Structural estimation methods.

BLPResult ¶

Bases: ResultProtocolMixin

Results from BLP demand estimation.

Attributes:

Name	Type	Description
`linear_params`	`Series`	Linear parameter estimates (β, α).
`nonlinear_params`	`Series`	Nonlinear parameter estimates (σ, random coefficient std devs).
`se_linear`	`Series`	Standard errors for linear parameters.
`se_nonlinear`	`Series`	Standard errors for nonlinear parameters.
`mean_utility`	`Series`	Estimated mean utility δ for each product-market.
`own_elasticities`	`Series`	Own-price elasticities for each product-market.
`n_markets`	`int`	Number of markets.
`n_products`	`int`	Total number of product-market observations.
`gmm_objective`	`float`	Value of the GMM objective at the optimum.
`converged`	`bool`	Whether the outer-loop optimization converged.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for m in range(10):  # 10 markets, 5 products each
...     x1 = rng.normal(0, 1, 5)
...     x2 = rng.normal(0, 1, 5)
...     price = rng.uniform(1, 3, 5)
...     delta = x1 + 0.5 * x2 - price + rng.normal(0, 0.2, 5)
...     ex = np.exp(delta)
...     share = ex / (1 + ex.sum())
...     for j in range(5):
...         rows.append({"market_id": m, "product_id": j,
...                      "share": share[j], "price": price[j],
...                      "x1": x1[j], "x2": x2[j]})
>>> df = pd.DataFrame(rows)
>>> res = sp.blp(df, shares="share", prices="price",
...              x_linear=["x1", "x2"], x_random=["x1"],
...              n_draws=50, maxiter=50, seed=0)
>>> type(res).__name__
'BLPResult'
>>> (res.n_markets, res.n_products)
(10, 50)
>>> bool("x1" in res.linear_params.index)
True

summary ¶

summary() -> str

Return a formatted summary of BLP estimation results.

elasticity_matrix ¶

elasticity_matrix(market_id: Optional[Any] = None) -> DataFrame

Return the full own- and cross-price elasticity matrix for a market.

Parameters:

Name	Type	Description	Default
`market_id`	`hashable`	Market to return. If None, returns the first market.	`None`

Returns:

Type	Description
`DataFrame`	(J_m x J_m) elasticity matrix with product labels.

diversion_ratios ¶

diversion_ratios(market_id: Optional[Any] = None) -> DataFrame

Compute diversion ratios for a given market.

Diversion ratio D_{jk} = fraction of consumers leaving product j that switch to product k (rather than the outside option or other products). D_{jk} = (ds_k/dp_j) / (-ds_j/dp_j).

With logit-type models this simplifies to cross-elasticity ratios adjusted by shares.

Parameters:

Name	Type	Description	Default
`market_id`	`hashable`	Market to compute for. If None, uses the first market.	`None`

Returns:

Type	Description
`DataFrame`

to_econometric_results ¶

to_econometric_results() -> EconometricResults

Convert to a standard EconometricResults object.

ProductionResult ¶

Bases: EconometricResults

Result object for production function estimation.

Inherits params / std_errors / summary / to_dict from :class:EconometricResults and adds production-function-specific payload:

coef — input elasticities keyed by input name (e.g. {"l": 0.62, "k": 0.31})
tfp — firm-time TFP estimates omega_it (in logs); same length as the post-stage-2 working sample
residuals — i.i.d. shock eta_it from stage 1
productivity_process — {"rho": float, "sigma": float} from the AR fit on omega
markup — placeholder; populated by :func:statspai.markup

Use .summary() for a Stata-style table or .coef for the raw dict.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for fid in range(30):
...     k = rng.normal(2.0, 0.5)
...     omega = rng.normal(0.0, 0.3)
...     for yr in range(8):
...         omega = 0.7 * omega + rng.normal(0.0, 0.2)
...         k = 0.9 * k + 0.3 * rng.normal(1.0, 0.3)
...         l = 0.6 * k + omega + rng.normal(1.0, 0.2)
...         m = 0.5 * k + 0.5 * l + omega + rng.normal(0.5, 0.2)
...         y = 0.6 * l + 0.3 * k + omega + rng.normal(0.0, 0.1)
...         rows.append(dict(id=fid, year=yr, y=y, l=l, k=k, m=m))
>>> df = pd.DataFrame(rows)
>>> res = sp.levinsohn_petrin(df, output="y", free="l", state="k", proxy="m")
>>> type(res).__name__
'ProductionResult'
>>> sorted(res.coef.keys())
['k', 'l']
>>> res.method
'lp'

cite ¶

cite(format: str = 'bibtex') -> str

Return the canonical reference string for self.method.

prod_fn ¶

prod_fn(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str | None = None, panel_id: str = 'id', time: str = 'year', method: str = 'acf', polynomial_degree: int = 3, productivity_degree: int = 1, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None, **kwargs: Any) -> ProductionResult

Production function estimation — unified interface.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long panel with one row per (firm, year).	required
`output`	`str`	Log output column.	`'y'`
`free`	`str or list`	Free inputs (e.g. labor).	``["l"]``
`state`	`str or list`	State / predetermined inputs (capital).	``["k"]``
`proxy`	`str`	Productivity proxy. Defaults: `"i"` for `method="op"`, `"m"` for all others.	`None`
`panel_id`	`str`	Panel identifier columns.	`'id'`
`time`	`str`	Panel identifier columns.	`'id'`
`method`	`('op', 'lp', 'acf', 'wrdg')`	Estimator. ACF is the modern default (corrects OP/LP identification problem).	`'op'`
`polynomial_degree`	`int`	Stage-1 control function polynomial degree.	`3`
`productivity_degree`	`int`	Productivity AR polynomial degree. Default `1` (linear AR(1)) is the most numerically robust choice — higher degrees can overfit `omega_t` given `omega_{t-1}` in finite samples and flatten the GMM objective surface, which makes the structural parameters numerically un-identified even when they are identified in population.	`1`
`functional_form`	`('cobb-douglas', 'translog')`	Functional form. Translog adds 0.5 * x_j*2 own-quadratic terms and x_jx_k cross terms — output elasticities then vary by firm-time and `ProductionResult.model_info["elasticities"]` carries a per-row DataFrame. Translog identification caveat: stage-2 instruments are formed as polynomial transforms of the same raw set used for Cobb-Douglas (`(k, l_lag)` for ACF, `(k, l)` for OP/LP). This is standard in the literature but the resulting moment system can be near-singular when state and lagged-free inputs are highly correlated, so finite-sample variance on the higher-order coefficients (`ll`, `kk`, `lk`) is substantially larger than on the linear `l` and `k` terms. Use bootstrap SEs (`boot_reps>=200`) to gauge this. Wooldridge does not yet support translog (raises `NotImplementedError`); use `method="acf"` or `method="lp"` for translog work.	`'cobb-douglas'`
`boot_reps`	`int`	Firm-cluster bootstrap replications. `0` ⇒ NaN standard errors.	`0`
`seed`	`int`		`None`

Returns:

Type	Description
`ProductionResult`	`.coef` for elasticities, `.tfp` for log-productivity series, `.summary()` for a Stata-style table.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for fid in range(60):
...     omega = rng.normal(0.0, 0.28)
...     k = rng.normal(0.0, 0.5)
...     for t in range(8):
...         omega = 0.7 * omega + rng.normal(0.0, 0.20)
...         l = 0.5 * omega + 0.3 * k + rng.normal(0.0, 0.10)
...         m = 0.8 * omega + 0.5 * k + rng.normal(0.0, 0.05)
...         y = 0.60 * l + 0.35 * k + omega + rng.normal(0.0, 0.10)
...         rows.append({"id": fid, "year": t, "y": y, "l": l, "k": k, "m": m})
...         k = 0.9 * k + 0.1 * rng.normal(0.5, 0.1)
>>> df = pd.DataFrame(rows)
>>> res = sp.prod_fn(df, output="y", free="l", state="k", proxy="m",
...                   panel_id="id", time="year", method="acf", seed=0)
>>> sorted(res.coef)  # output elasticities for free + state inputs
['k', 'l']
>>> # De Loecker-Warzynski markup needs the flexible input among the
>>> # production-function coefficients, so refit with materials as free:
>>> res_m = sp.prod_fn(df, output="y", free=["l", "m"], state="k",
...                    proxy="m", panel_id="id", time="year", method="acf")
>>> mu = sp.markup(
...     res_m, revenue="y", input_cost="m", flexible_input="m"
... )

olley_pakes ¶

olley_pakes(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str = 'i', panel_id: str = 'id', time: str = 'year', polynomial_degree: int = 3, productivity_degree: int = 1, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None, drop_zero_proxy: bool = True) -> ProductionResult

Olley-Pakes (1996) production function estimator.

Uses investment as the proxy for unobserved productivity. Firms with zero investment are dropped by default — the inversion of the investment policy requires a strictly positive proxy.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-form panel: one row per (firm, year).	required
`output`	`str`	Log output column.	``"y"``
`free`	`str or list`	Freely chosen inputs (e.g. labor). Multiple are allowed.	``["l"]``
`state`	`str or list`	State inputs (capital, predetermined).	``["k"]``
`proxy`	`str`	Investment column (must be > 0 to invert).	``"i"``
`panel_id`	`str`	Firm and year identifiers.	`'id'`
`time`	`str`	Firm and year identifiers.	`'id'`
`polynomial_degree`	`int`	Degree of the stage-1 polynomial in (free, state, proxy).	`3`
`productivity_degree`	`int`	Degree of the polynomial g in the AR productivity process.	`3`
`functional_form`	`('cobb-douglas', 'translog')`	Production function form. Translog adds quadratic and cross terms; `ProductionResult.model_info["elasticities"]` then carries firm-time output elasticities.	`'cobb-douglas'`
`boot_reps`	`int`	Firm-cluster bootstrap replications. `0` ⇒ NaN standard errors.	`0`
`seed`	`int`	Bootstrap RNG seed.	`None`
`drop_zero_proxy`	`bool`	Drop rows with non-positive proxy (required by the OP inversion). Note that dropping period `t` for a firm also forfeits period `t+1` for that firm in stage 2 (lag operator now has no predecessor), so firms with sporadic zero-investment years lose more observations than the raw drop count.	`True`

Returns:

Type	Description
`ProductionResult`

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for fid in range(30):
...     k = rng.normal(2.0, 0.5)
...     omega = rng.normal(0.0, 0.3)
...     for yr in range(8):
...         omega = 0.7 * omega + rng.normal(0.0, 0.2)
...         k = 0.9 * k + 0.3 * rng.normal(1.0, 0.3)
...         l = 0.6 * k + omega + rng.normal(1.0, 0.2)
...         inv = 0.4 * k + omega + rng.normal(1.0, 0.2)  # investment > 0
...         y = 0.6 * l + 0.3 * k + omega + rng.normal(0.0, 0.1)
...         rows.append(dict(id=fid, year=yr, y=y, l=l, k=k, i=inv))
>>> df = pd.DataFrame(rows)
>>> res = sp.olley_pakes(df, output="y", free="l", state="k", proxy="i")
>>> sorted(res.coef.keys())
['k', 'l']
>>> res.method
'op'

References

Olley, G.S. & Pakes, A. (1996). The dynamics of productivity in the telecommunications equipment industry. Econometrica, 64(6), 1263-1297.

levinsohn_petrin ¶

levinsohn_petrin(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str = 'm', panel_id: str = 'id', time: str = 'year', polynomial_degree: int = 3, productivity_degree: int = 1, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None) -> ProductionResult

Levinsohn-Petrin (2003) production function estimator.

Uses intermediate input (materials / energy) as the productivity proxy. Avoids the OP zero-investment selection problem because most firms use materials in every period.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-form panel.	required
`output`	`str`	Log output.	`'y'`
`free`	`str or list`	Free inputs.	``["l"]``
`state`	`str or list`	State inputs.	``["k"]``
`proxy`	`str`	Intermediate input (materials).	``"m"``
`panel_id`	`str`		`'id'`
`time`	`str`		`'id'`
`polynomial_degree`	`int`		`3`
`productivity_degree`	`int`		`3`
`functional_form`	`('cobb-douglas', 'translog')`		`'cobb-douglas'`
`boot_reps`	`int`		`0`
`seed`	`int`		`None`

Returns:

Type	Description
`ProductionResult`

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for fid in range(30):
...     k = rng.normal(2.0, 0.5)
...     omega = rng.normal(0.0, 0.3)
...     for yr in range(8):
...         omega = 0.7 * omega + rng.normal(0.0, 0.2)
...         k = 0.9 * k + 0.3 * rng.normal(1.0, 0.3)
...         l = 0.6 * k + omega + rng.normal(1.0, 0.2)
...         m = 0.5 * k + 0.5 * l + omega + rng.normal(0.5, 0.2)
...         y = 0.6 * l + 0.3 * k + omega + rng.normal(0.0, 0.1)
...         rows.append(dict(id=fid, year=yr, y=y, l=l, k=k, m=m))
>>> df = pd.DataFrame(rows)
>>> res = sp.levinsohn_petrin(df, output="y", free="l", state="k", proxy="m")
>>> sorted(res.coef.keys())
['k', 'l']
>>> res.method
'lp'

References

Levinsohn, J. & Petrin, A. (2003). Estimating production functions using inputs to control for unobservables. Review of Economic Studies, 70(2), 317-341.

ackerberg_caves_frazer ¶

ackerberg_caves_frazer(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str = 'm', panel_id: str = 'id', time: str = 'year', polynomial_degree: int = 3, productivity_degree: int = 1, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None) -> ProductionResult

Ackerberg-Caves-Frazer (2015) production function estimator.

Corrects the OP / LP "functional dependence" identification problem: when free inputs (labor) are chosen at the same time as the proxy, the labor coefficient is not identified in the stage-1 polynomial. ACF moves all coefficient identification to stage 2, instrumenting free inputs with their lagged values and state inputs at the contemporaneous level.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`output`	`str`		`'y'`
`free`	`str or list`	Free inputs — instrumented with their lag in stage 2.	``["l"]``
`state`	`str or list`		``["k"]``
`proxy`	`str`	Intermediate input (materials).	``"m"``
`panel_id`	`str`		`'id'`
`time`	`str`		`'id'`
`polynomial_degree`	`int`		`3`
`productivity_degree`	`int`		`3`
`functional_form`	`('cobb-douglas', 'translog')`		`'cobb-douglas'`
`boot_reps`	`int`		`0`
`seed`	`int`		`None`

Returns:

Type	Description
`ProductionResult`

Notes

Requires at least two consecutive time periods per firm so that lagged labor exists.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for fid in range(30):
...     k = rng.normal(2.0, 0.5)
...     omega = rng.normal(0.0, 0.3)
...     for yr in range(8):
...         omega = 0.7 * omega + rng.normal(0.0, 0.2)
...         k = 0.9 * k + 0.3 * rng.normal(1.0, 0.3)
...         l = 0.6 * k + omega + rng.normal(1.0, 0.2)
...         m = 0.5 * k + 0.5 * l + omega + rng.normal(0.5, 0.2)
...         y = 0.6 * l + 0.3 * k + omega + rng.normal(0.0, 0.1)
...         rows.append(dict(id=fid, year=yr, y=y, l=l, k=k, m=m))
>>> df = pd.DataFrame(rows)
>>> res = sp.ackerberg_caves_frazer(df, output="y", free="l", state="k", proxy="m")
>>> sorted(res.coef.keys())
['k', 'l']
>>> res.method
'acf'

References

Ackerberg, D.A., Caves, K. & Frazer, G. (2015). Identification properties of recent production function estimators. Econometrica, 83(6), 2411-2451.

wooldridge_prod ¶

wooldridge_prod(data: DataFrame, output: str = 'y', free: Sequence[str] | str | None = None, state: Sequence[str] | str | None = None, proxy: str = 'm', panel_id: str = 'id', time: str = 'year', polynomial_degree: int = 2, productivity_degree: int = 2, functional_form: str = 'cobb-douglas', boot_reps: int = 0, seed: Optional[int] = None) -> ProductionResult

Wooldridge (2009) joint production function estimator (stacked NLS).

Estimates (beta_l, beta_k) jointly with the nonparametric control function h(m, k) and productivity Markov polynomial g(omega_{t-1}) by minimizing the sum of squared residuals over a stacked level + productivity-substituted equation system. This is equivalent to one-step GMM with identity weight matrix and instruments equal to the regressors (NLS). A full GMM version with optimal weighting is on the roadmap.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long panel with one row per (firm, year).	required
`output`	`str`	Log output column.	``"y"``
`free`	`str or list`	Free inputs (labor).	``["l"]``
`state`	`str or list`	State inputs (capital).	``["k"]``
`proxy`	`str`	Productivity proxy (typically intermediate input).	``"m"``
`panel_id`	`str`	Panel identifiers.	`'id'`
`time`	`str`	Panel identifiers.	`'id'`
`polynomial_degree`	`int`	Degree of `h(m, k)`. Smaller than OP/LP/ACF default because the joint problem is higher-dimensional.	`2`
`productivity_degree`	`int`	Degree of the AR polynomial `g(omega_{t-1})`.	`2`
`boot_reps`	`int`	Firm-cluster bootstrap replications. `0` ⇒ NaN standard errors.	`0`
`seed`	`int`		`None`

Returns:

Type	Description
`ProductionResult`

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for fid in range(30):
...     k = rng.normal(2.0, 0.5)
...     omega = rng.normal(0.0, 0.3)
...     for yr in range(8):
...         omega = 0.7 * omega + rng.normal(0.0, 0.2)
...         k = 0.9 * k + 0.3 * rng.normal(1.0, 0.3)
...         l = 0.6 * k + omega + rng.normal(1.0, 0.2)
...         m = 0.5 * k + 0.5 * l + omega + rng.normal(0.5, 0.2)
...         y = 0.6 * l + 0.3 * k + omega + rng.normal(0.0, 0.1)
...         rows.append(dict(id=fid, year=yr, y=y, l=l, k=k, m=m))
>>> df = pd.DataFrame(rows)
>>> res = sp.wooldridge_prod(df, output="y", free="l", state="k", proxy="m")
>>> sorted(res.coef.keys())
['k', 'l']
>>> res.method
'wrdg'

References

Wooldridge, J.M. (2009). On estimating firm-level production functions using proxy variables to control for unobservables. Economics Letters, 104(3), 112-114.

statspai.structural¶

structural ¶

BLPResult ¶

summary ¶

elasticity_matrix ¶

diversion_ratios ¶

to_econometric_results ¶

ProductionResult ¶

cite ¶

prod_fn ¶

olley_pakes ¶

levinsohn_petrin ¶

ackerberg_caves_frazer ¶

wooldridge_prod ¶

`statspai.structural`¶