Skip to content

statspai.frontier

frontier

Stochastic frontier analysis (SFA).

Cross-sectional estimators: :func:frontier (half-normal / exponential / truncated-normal; supports heteroskedastic sigma_u & sigma_v plus inefficiency determinants emean).

Panel estimators: :func:xtfrontier with model in {'ti', 'tvd', 'bc95'} (Pitt-Lee 1981, Battese-Coelli 1992, Battese-Coelli 1995).

Helpers: :func:te_summary.

FrontierResult

Bases: EconometricResults

Result object returned by :func:frontier and :func:xtfrontier.

Extends :class:~statspai.core.results.EconometricResults with efficiency-score access, LR tests, and bootstrap helpers.

summary

summary(alpha: float = 0.05) -> str

Formatted summary table (Stata-style SFA block).

Overrides :class:EconometricResults.summary to hide per-observation diagnostic arrays and surface the SFA-specific scalars.

efficiency

efficiency(method: Optional[str] = None) -> Series

Return unit-level technical efficiency scores.

Parameters:

Name Type Description Default
method ('bc', 'jlms')

'bc' (default) : Battese-Coelli (1988) E[exp(-u)|eps]. 'jlms' : Jondrow-Lovell-Materov-Schmidt exp(-E[u|eps]). If None, uses the default stored at fit time.

'bc'

inefficiency

inefficiency(method: str = 'jlms') -> Series

Return E[u|eps] (inefficiency), Jondrow et al. (1982).

predict

predict(new_data: DataFrame, what: str = 'frontier') -> Series

Out-of-sample prediction.

Parameters:

Name Type Description Default
new_data DataFrame

Must contain the frontier regressors and, if the model has usigma / vsigma / emean covariates, those columns too. For what='conditional_*' the dependent variable y must also be present so that the composed-error residual can be computed and conditioned on. Rows with any missing value are dropped.

required
what {'frontier', 'expected_inefficiency', 'expected_efficiency',
'conditional_inefficiency', 'conditional_efficiency'}
  • 'frontier' — deterministic frontier x_new' beta.
  • 'expected_inefficiency' — marginal E[u_new].
  • 'expected_efficiency' — marginal E[exp(-u_new)].
  • 'conditional_inefficiency' — Jondrow posterior E[u | eps_new] where eps_new = y_new - x_new'beta; requires y column in new_data.
  • 'conditional_efficiency' — Battese-Coelli E[exp(-u) | eps_new]; requires y.
'frontier'

Returns:

Type Description
Series

Indexed by the (post-dropna) rows of new_data.

marginal_effects

marginal_effects(kind: str = 'inefficiency', source: str = 'emean', at: str = 'observation') -> DataFrame

Marginal effects of inefficiency-shifting covariates.

Parameters:

Name Type Description Default
kind 'inefficiency'

Currently only E[u] marginal effects are supported.

'inefficiency'
source ('emean', 'usigma')

'emean' — derivative wrt the BC95 mean covariates z via mu_i = delta'[1, z_i]. Requires dist='truncated-normal' and a model fitted with emean=[...]. 'usigma' — derivative wrt the Caudill-Ford-Gropper (1995) sigma_u covariates w via ln sigma_u_i = gamma'[1, w_i]. Requires a model fitted with usigma=[...].

'emean'
at ('observation', 'mean', 'ame')
'observation'
Formulas

emean (truncated-normal): d E[u_i] / d z_ij = delta_j * [1 - (mu/sigma) * phi/Phi - (phi/Phi)^2] usigma (half-normal): d E[u_i] / d w_ij = gamma_j * sigma_u_i * sqrt(2/pi) usigma (exponential): d E[u_i] / d w_ij = gamma_j * sigma_u_i usigma (truncated-normal): d E[u_i] / d w_ij = gamma_j * sigma_u_i * [phi/Phi + ratio * phi/Phi * (phi/Phi - (-ratio))] (chain rule through sigma_u_i = exp(gamma'[1, w_i])).

returns_to_scale

returns_to_scale(inputs: Optional[List[str]] = None, alpha: float = 0.05) -> Dict[str, float]

Sum of input elasticities (RTS) with Wald test H0: RTS = 1 (CRS).

Parameters:

Name Type Description Default
inputs list of str

Input-variable names (should be log-transformed inputs in a Cobb-Douglas frontier). Defaults to self.data_info['regressors'].

None
alpha float
0.05

Returns:

Type Description
dict with keys: ``rts``, ``se``, ``statistic``, ``pvalue``,
``ci_lower``, ``ci_upper``, ``interpretation``.

lr_test_no_inefficiency

lr_test_no_inefficiency() -> Dict[str, float]

One-sided LR test H0: sigma_u = 0 (mixed chi-bar squared).

efficiency_ci

efficiency_ci(alpha: float = 0.05, B: int = 500, method: Optional[str] = None, seed: Optional[int] = 0) -> DataFrame

Parametric-bootstrap CI for unit-level efficiency scores.

Draws (u_b, v_b) ~ posterior predictive using the fitted variance parameters, then recomputes the Jondrow posterior for the resampled composed error. Returns a DataFrame indexed like :meth:efficiency with columns ['point', 'lower', 'upper'].

MetafrontierResult dataclass

Container for a metafrontier fit.

MalmquistResult dataclass

Container for Malmquist productivity index decomposition.

index_table instance-attribute

index_table: DataFrame

Wide table: one row per (id, period pair) with columns ['m_index', 'ec', 'tc'] plus the original id / period columns.

period_frontiers instance-attribute

period_frontiers: Dict[Any, FrontierResult]

Frontier fit per period.

summary_by_period instance-attribute

summary_by_period: DataFrame

Mean M / EC / TC per period transition.

frontier

frontier(data: DataFrame, y: str, x: List[str], *, dist: str = 'half-normal', cost: bool = False, usigma: Optional[List[str]] = None, vsigma: Optional[List[str]] = None, emean: Optional[List[str]] = None, te_method: str = 'bc', vce: str = 'oim', cluster: Optional[str] = None, B: int = 400, seed: Optional[int] = None, maxiter: int = 500, tol: float = 1e-08, alpha: float = 0.05, start: Optional[ndarray] = None) -> FrontierResult

Estimate a cross-sectional stochastic frontier model by ML.

Parameters:

Name Type Description Default
data DataFrame

Cross-sectional data. Rows with missing values in any referenced column are dropped.

required
y str

Dependent variable (output for production, cost for cost frontier).

required
x list of str

Frontier regressors (a constant is added automatically).

required
dist ('half-normal', 'exponential', 'truncated-normal')

Distribution of the inefficiency term u.

'half-normal'
cost bool

If True, estimate a cost frontier (composed error v + u).

False
usigma list of str

Columns parameterizing ln sigma_u_i = gamma_u' [1, w_i] (Caudill-Ford-Gropper 1995).

None
vsigma list of str

Columns parameterizing ln sigma_v_i = gamma_v' [1, r_i] (Wang 2002).

None
emean list of str

Columns parameterizing mu_i = delta' [1, z_i] for the truncated normal (Battese-Coelli 1995; Kumbhakar-Ghosh-McGuckin 1991). Requires dist='truncated-normal'.

None
te_method ('bc', 'jlms')

Default technical-efficiency formula accessed via .efficiency().

'bc'
vce ('oim', 'opg', 'robust')

Variance-covariance estimator: 'oim' — observed information matrix (inverse numerical Hessian). 'opg' — outer product of gradients (Berndt-Hall-Hall-Hausman). 'robust' — sandwich H^{-1} (S' S) H^{-1} (White 1982).

'oim'
cluster str

Cluster variable for cluster-robust SE (Liang-Zeger 1986). When specified, implies vce='robust' aggregated over clusters.

None
maxiter int
500
tol float
1e-8
alpha float
0.05
start ndarray

User-supplied starting values for the full parameter vector.

None

Returns:

Type Description
class:`FrontierResult`

Examples:

>>> import statspai as sp
>>> res = sp.frontier(df, y='log_y', x=['log_k', 'log_l'])
>>> res.efficiency().describe()
>>> res.lr_test_no_inefficiency()
>>> sp.frontier(df, y='log_y', x=['log_k', 'log_l'],
...             dist='truncated-normal', emean=['firm_age'])

xtfrontier

xtfrontier(data: DataFrame, y: str, x: List[str], id: str, time: Optional[str] = None, *, model: str = 'ti', dist: str = 'half-normal', cost: bool = False, emean: Optional[List[str]] = None, vce: str = 'oim', cluster: Optional[str] = None, bias_correct: bool = False, n_quad: int = 24, maxiter: int = 500, tol: float = 1e-08, alpha: float = 0.05) -> FrontierResult

Panel stochastic frontier estimator.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
x list of str
required
id str

Panel unit identifier.

required
time str

Time variable (required for model='tvd' and recommended for all panel models; falls back to observation order otherwise).

None
model ('ti', 'tvd', 'bc95', 'tfe', 'tre')

'ti' Pitt-Lee (1981) time-invariant inefficiency. 'tvd' Battese-Coelli (1992) time-varying decay. 'bc95' Battese-Coelli (1995) inefficiency effects model (requires emean). 'tfe' Greene (2005) True Fixed Effects: unit dummies + cross-sectional composed error. Recommended for T >= ~10. 'tre' Greene (2005) True Random Effects: alpha_i ~ N(0, sigma_alpha^2) integrated out by Gauss-Hermite quadrature.

'ti'
dist ('half-normal', 'truncated-normal')

For ti, tvd, tfe, tre. BC95 always uses TN.

'half-normal'
cost bool
False
emean list of str

Required for model='bc95'; inefficiency determinants z_it.

None
vce ('oim', 'opg', 'robust', 'cluster')

Variance-covariance estimator. 'oim' uses the inverse observed information. 'opg' is the outer product of gradients (BHHH). 'robust' is the sandwich H^-1 (S'S) H^-1. Passing cluster= implies cluster-robust SEs (Liang-Zeger 1986). Note: vce='bootstrap' is only available on the cross-sectional :func:frontier; for panel models use 'robust' with cluster=id instead.

'oim'
cluster str

Column name for cluster-robust SEs. Defaults to id whenever vce != 'oim' (the natural grouping for panels).

None
bias_correct bool

TFE-only. If True, applies Dhaene-Jochmans (2015) split-panel jackknife to reduce the O(1/T) incidental-parameters bias on beta and sigma_u.

False
n_quad int

TRE-only. Number of Gauss-Hermite nodes used to integrate out alpha_i. Increase to 48 or 64 when sigma_alpha is large relative to sigma_v (large between-firm heterogeneity) so that the quadrature tails are not truncated. A warning is emitted when the fitted sigma_alpha suggests insufficient tail coverage at the chosen n_quad.

24
maxiter see :func:`frontier`.
500
tol see :func:`frontier`.
500
alpha see :func:`frontier`.
500

Returns:

Type Description
class:`~statspai.frontier.FrontierResult`
Notes

σ_u and σ_v conventions, and Stata parity gap.

sigma_u and sigma_v in model_info are the underlying normal standard deviations of the half-normal inefficiency and the symmetric noise term, respectively. This matches R's frontier::sfa(... , truncNorm=FALSE, timeEffect=FALSE) to rel < 1e-4 on the production-frontier DGP in tests/r_parity/29_panel_sfa.

Stata's xtfrontier ..., ti reports an e(sigma_u) value that can be ~40 % larger than the one returned here, while e(sigma_v) matches at < 1 %. This is a known parity gap; in our parity DGP Stata's reported σ_u corresponds to a different point on the same likelihood surface (the likelihood is mildly multimodal on Pitt-Lee for short panels). When porting Stata code, treat σ_u parity at the < 10 % level as "structurally aligned" and cross-check via gamma = sigma_u^2 / (sigma_u^2 + sigma_v^2) or by the mean efficiency in model_info['mean_efficiency_bc'], which are far less sensitive to the local-optimum gap.

te_summary

te_summary(result, method: Optional[str] = None) -> DataFrame

Return a small descriptive DataFrame of TE scores (summary stats only).

te_rank

te_rank(result, method: Optional[str] = None, with_ci: bool = False, alpha: float = 0.05, B: int = 500, seed: Optional[int] = 0) -> DataFrame

Return efficiency scores sorted descending, with rank column.

If with_ci=True, calls :meth:FrontierResult.efficiency_ci for parametric-bootstrap bounds. For very large samples prefer a small B.

zisf

zisf(data: DataFrame, y: str, x: List[str], *, zprob: Optional[List[str]] = None, dist: str = 'half-normal', cost: bool = False, vce: str = 'oim', cluster: Optional[str] = None, maxiter: int = 500, tol: float = 1e-08, alpha: float = 0.05) -> FrontierResult

Zero-Inefficiency Stochastic Frontier (Kumbhakar-Parmeter-Tsionas 2013).

The population is a mixture of two regimes:

  • Fully efficient (share p_i): y_it = x_it' beta + v_it.
  • Inefficient (share 1 - p_i): standard composed-error frontier y = x'beta + v + sign * u.

The mixing probability p_i is parameterised via a logit link: p_i = expit(z_i' theta) where z_i = [1, zprob_vars_i]. If zprob=None the probability is constant across observations.

Parameters:

Name Type Description Default
zprob list of str

Covariates for the mixing probability; a constant is added.

None
dist 'half-normal'

Distribution of u in the inefficient regime. Currently only half-normal is supported (KPT 2013 baseline).

'half-normal'

Returns:

Type Description
class:`~statspai.frontier.FrontierResult`

lcsf

lcsf(data: DataFrame, y: str, x: List[str], *, z_class: Optional[List[str]] = None, dist: str = 'half-normal', cost: bool = False, vce: str = 'oim', cluster: Optional[str] = None, maxiter: int = 500, tol: float = 1e-08, alpha: float = 0.05) -> FrontierResult

Two-class Latent-Class SFA (Orea-Kumbhakar 2004; Greene 2005).

Each observation belongs latently to class 1 or class 2, each with its own frontier coefficients beta_k and variance parameters sigma_v_k, sigma_u_k. Class probability optionally depends on z_class via logit.

Parameters:

Name Type Description Default
z_class list of str

Covariates shifting the class-1 logit probability.

None

Returns:

Type Description
FrontierResult with extended ``params`` block and per-obs posterior
class probabilities in ``diagnostics['p_class1_posterior']``.

translog_design

translog_design(data: DataFrame, inputs: List[str], *, include_interactions: bool = True, include_squares: bool = True, interaction_prefix: str = '') -> DataFrame

Build a translog design matrix from Cobb-Douglas inputs.

Translog is log y = alpha + sum_k beta_k * log x_k + 0.5 sum_k sum_l gamma_{kl} * log x_k * log x_l.

This helper takes input columns (already in log form) and returns a DataFrame with the original columns plus squares x_k^2 / 2 and cross-products x_k * x_l that can be fed straight to :func:frontier / :func:xtfrontier as additional regressors.

Parameters:

Name Type Description Default
data DataFrame
required
inputs list of str

Columns containing log x_k terms (already log-transformed).

required
include_interactions bool

If True, adds x_k * x_l for k < l.

True
include_squares bool

If True, adds 0.5 * x_k^2 terms (translog convention).

True
interaction_prefix str

Optional prefix for the generated columns (e.g., "tl_").

""

Returns:

Type Description
DataFrame

Original data + appended translog terms. Two lists are stored on df.attrs for convenience (neither is auto-consumed by :func:frontier or :func:xtfrontier — the user must pass one of them explicitly as x=):

  • df.attrs['translog_terms']all regressors for a translog frontier: original inputs + squares + interactions. Pass this directly to sp.frontier(..., x=terms).
  • df.attrs['translog_added_terms'] — only the new columns appended by this helper (squares and interactions). Use this if you already have inputs in your x list and just want to extend it.

Examples:

>>> df_tl = translog_design(df, inputs=["log_k", "log_l"])
>>> # Option A — one-liner, pass the full translog regressor list:
>>> terms = df_tl.attrs["translog_terms"]
>>> sp.frontier(df_tl, y="log_y", x=terms)
>>> # Option B — extend an existing x list without double-counting:
>>> base = ["log_k", "log_l"]
>>> sp.frontier(df_tl, y="log_y",
...             x=base + df_tl.attrs["translog_added_terms"])