statspai.frontier¶
frontier ¶
Stochastic frontier analysis (SFA).
Cross-sectional estimators: :func:frontier (half-normal / exponential /
truncated-normal; supports heteroskedastic sigma_u & sigma_v plus
inefficiency determinants emean).
Panel estimators: :func:xtfrontier with model in
{'ti', 'tvd', 'bc95'} (Pitt-Lee 1981, Battese-Coelli 1992,
Battese-Coelli 1995).
Helpers: :func:te_summary.
FrontierResult ¶
Bases: EconometricResults
Result object returned by :func:frontier and :func:xtfrontier.
Extends :class:~statspai.core.results.EconometricResults with
efficiency-score access, LR tests, and bootstrap helpers.
summary ¶
Formatted summary table (Stata-style SFA block).
Overrides :class:EconometricResults.summary to hide per-observation
diagnostic arrays and surface the SFA-specific scalars.
efficiency ¶
Return unit-level technical efficiency scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
('bc', 'jlms')
|
'bc' (default) : Battese-Coelli (1988) |
'bc'
|
inefficiency ¶
Return E[u|eps] (inefficiency), Jondrow et al. (1982).
predict ¶
Out-of-sample prediction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_data
|
DataFrame
|
Must contain the frontier regressors and, if the model has
|
required |
what
|
{'frontier', 'expected_inefficiency', 'expected_efficiency',
|
|
'frontier'
|
Returns:
| Type | Description |
|---|---|
Series
|
Indexed by the (post-dropna) rows of |
marginal_effects ¶
marginal_effects(kind: str = 'inefficiency', source: str = 'emean', at: str = 'observation') -> DataFrame
Marginal effects of inefficiency-shifting covariates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
'inefficiency'
|
Currently only |
'inefficiency'
|
source
|
('emean', 'usigma')
|
|
'emean'
|
at
|
('observation', 'mean', 'ame')
|
|
'observation'
|
Formulas
emean (truncated-normal): d E[u_i] / d z_ij = delta_j * [1 - (mu/sigma) * phi/Phi - (phi/Phi)^2] usigma (half-normal): d E[u_i] / d w_ij = gamma_j * sigma_u_i * sqrt(2/pi) usigma (exponential): d E[u_i] / d w_ij = gamma_j * sigma_u_i usigma (truncated-normal): d E[u_i] / d w_ij = gamma_j * sigma_u_i * [phi/Phi + ratio * phi/Phi * (phi/Phi - (-ratio))] (chain rule through sigma_u_i = exp(gamma'[1, w_i])).
returns_to_scale ¶
Sum of input elasticities (RTS) with Wald test H0: RTS = 1 (CRS).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs
|
list of str
|
Input-variable names (should be log-transformed inputs in a
Cobb-Douglas frontier). Defaults to |
None
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
dict with keys: ``rts``, ``se``, ``statistic``, ``pvalue``,
|
|
``ci_lower``, ``ci_upper``, ``interpretation``.
|
|
lr_test_no_inefficiency ¶
One-sided LR test H0: sigma_u = 0 (mixed chi-bar squared).
efficiency_ci ¶
efficiency_ci(alpha: float = 0.05, B: int = 500, method: Optional[str] = None, seed: Optional[int] = 0) -> DataFrame
Parametric-bootstrap CI for unit-level efficiency scores.
Draws (u_b, v_b) ~ posterior predictive using the fitted
variance parameters, then recomputes the Jondrow posterior for
the resampled composed error. Returns a DataFrame indexed like
:meth:efficiency with columns ['point', 'lower', 'upper'].
MetafrontierResult
dataclass
¶
Container for a metafrontier fit.
MalmquistResult
dataclass
¶
Container for Malmquist productivity index decomposition.
index_table
instance-attribute
¶
Wide table: one row per (id, period pair) with columns
['m_index', 'ec', 'tc'] plus the original id / period columns.
period_frontiers
instance-attribute
¶
period_frontiers: Dict[Any, FrontierResult]
Frontier fit per period.
summary_by_period
instance-attribute
¶
Mean M / EC / TC per period transition.
frontier ¶
frontier(data: DataFrame, y: str, x: List[str], *, dist: str = 'half-normal', cost: bool = False, usigma: Optional[List[str]] = None, vsigma: Optional[List[str]] = None, emean: Optional[List[str]] = None, te_method: str = 'bc', vce: str = 'oim', cluster: Optional[str] = None, B: int = 400, seed: Optional[int] = None, maxiter: int = 500, tol: float = 1e-08, alpha: float = 0.05, start: Optional[ndarray] = None) -> FrontierResult
Estimate a cross-sectional stochastic frontier model by ML.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Cross-sectional data. Rows with missing values in any referenced column are dropped. |
required |
y
|
str
|
Dependent variable (output for production, cost for cost frontier). |
required |
x
|
list of str
|
Frontier regressors (a constant is added automatically). |
required |
dist
|
('half-normal', 'exponential', 'truncated-normal')
|
Distribution of the inefficiency term |
'half-normal'
|
cost
|
bool
|
If True, estimate a cost frontier (composed error |
False
|
usigma
|
list of str
|
Columns parameterizing |
None
|
vsigma
|
list of str
|
Columns parameterizing |
None
|
emean
|
list of str
|
Columns parameterizing |
None
|
te_method
|
('bc', 'jlms')
|
Default technical-efficiency formula accessed via |
'bc'
|
vce
|
('oim', 'opg', 'robust')
|
Variance-covariance estimator:
|
'oim'
|
cluster
|
str
|
Cluster variable for cluster-robust SE (Liang-Zeger 1986). When
specified, implies |
None
|
maxiter
|
int
|
|
500
|
tol
|
float
|
|
1e-8
|
alpha
|
float
|
|
0.05
|
start
|
ndarray
|
User-supplied starting values for the full parameter vector. |
None
|
Returns:
| Type | Description |
|---|---|
class:`FrontierResult`
|
|
Examples:
xtfrontier ¶
xtfrontier(data: DataFrame, y: str, x: List[str], id: str, time: Optional[str] = None, *, model: str = 'ti', dist: str = 'half-normal', cost: bool = False, emean: Optional[List[str]] = None, vce: str = 'oim', cluster: Optional[str] = None, bias_correct: bool = False, n_quad: int = 24, maxiter: int = 500, tol: float = 1e-08, alpha: float = 0.05) -> FrontierResult
Panel stochastic frontier estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
x
|
list of str
|
|
required |
id
|
str
|
Panel unit identifier. |
required |
time
|
str
|
Time variable (required for |
None
|
model
|
('ti', 'tvd', 'bc95', 'tfe', 'tre')
|
|
'ti'
|
dist
|
('half-normal', 'truncated-normal')
|
For |
'half-normal'
|
cost
|
bool
|
|
False
|
emean
|
list of str
|
Required for |
None
|
vce
|
('oim', 'opg', 'robust', 'cluster')
|
Variance-covariance estimator. |
'oim'
|
cluster
|
str
|
Column name for cluster-robust SEs. Defaults to |
None
|
bias_correct
|
bool
|
TFE-only. If True, applies Dhaene-Jochmans (2015) split-panel
jackknife to reduce the O(1/T) incidental-parameters bias on
|
False
|
n_quad
|
int
|
TRE-only. Number of Gauss-Hermite nodes used to integrate out
|
24
|
maxiter
|
see :func:`frontier`.
|
|
500
|
tol
|
see :func:`frontier`.
|
|
500
|
alpha
|
see :func:`frontier`.
|
|
500
|
Returns:
| Type | Description |
|---|---|
class:`~statspai.frontier.FrontierResult`
|
|
Notes
σ_u and σ_v conventions, and Stata parity gap.
sigma_u and sigma_v in model_info are the underlying
normal standard deviations of the half-normal inefficiency and the
symmetric noise term, respectively. This matches R's
frontier::sfa(... , truncNorm=FALSE, timeEffect=FALSE) to
rel < 1e-4 on the production-frontier DGP in
tests/r_parity/29_panel_sfa.
Stata's xtfrontier ..., ti reports an e(sigma_u) value that
can be ~40 % larger than the one returned here, while e(sigma_v)
matches at < 1 %. This is a known parity gap; in our parity DGP
Stata's reported σ_u corresponds to a different point on the same
likelihood surface (the likelihood is mildly multimodal on
Pitt-Lee for short panels). When porting Stata code, treat
σ_u parity at the < 10 % level as "structurally aligned" and
cross-check via gamma = sigma_u^2 / (sigma_u^2 + sigma_v^2)
or by the mean efficiency in model_info['mean_efficiency_bc'],
which are far less sensitive to the local-optimum gap.
te_summary ¶
Return a small descriptive DataFrame of TE scores (summary stats only).
te_rank ¶
te_rank(result, method: Optional[str] = None, with_ci: bool = False, alpha: float = 0.05, B: int = 500, seed: Optional[int] = 0) -> DataFrame
Return efficiency scores sorted descending, with rank column.
If with_ci=True, calls :meth:FrontierResult.efficiency_ci for
parametric-bootstrap bounds. For very large samples prefer a small B.
zisf ¶
zisf(data: DataFrame, y: str, x: List[str], *, zprob: Optional[List[str]] = None, dist: str = 'half-normal', cost: bool = False, vce: str = 'oim', cluster: Optional[str] = None, maxiter: int = 500, tol: float = 1e-08, alpha: float = 0.05) -> FrontierResult
Zero-Inefficiency Stochastic Frontier (Kumbhakar-Parmeter-Tsionas 2013).
The population is a mixture of two regimes:
- Fully efficient (share
p_i):y_it = x_it' beta + v_it. - Inefficient (share
1 - p_i): standard composed-error frontiery = x'beta + v + sign * u.
The mixing probability p_i is parameterised via a logit link:
p_i = expit(z_i' theta) where z_i = [1, zprob_vars_i].
If zprob=None the probability is constant across observations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
zprob
|
list of str
|
Covariates for the mixing probability; a constant is added. |
None
|
dist
|
'half-normal'
|
Distribution of |
'half-normal'
|
Returns:
| Type | Description |
|---|---|
class:`~statspai.frontier.FrontierResult`
|
|
lcsf ¶
lcsf(data: DataFrame, y: str, x: List[str], *, z_class: Optional[List[str]] = None, dist: str = 'half-normal', cost: bool = False, vce: str = 'oim', cluster: Optional[str] = None, maxiter: int = 500, tol: float = 1e-08, alpha: float = 0.05) -> FrontierResult
Two-class Latent-Class SFA (Orea-Kumbhakar 2004; Greene 2005).
Each observation belongs latently to class 1 or class 2, each with
its own frontier coefficients beta_k and variance parameters
sigma_v_k, sigma_u_k. Class probability optionally depends
on z_class via logit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
z_class
|
list of str
|
Covariates shifting the class-1 logit probability. |
None
|
Returns:
| Type | Description |
|---|---|
FrontierResult with extended ``params`` block and per-obs posterior
|
|
class probabilities in ``diagnostics['p_class1_posterior']``.
|
|
translog_design ¶
translog_design(data: DataFrame, inputs: List[str], *, include_interactions: bool = True, include_squares: bool = True, interaction_prefix: str = '') -> DataFrame
Build a translog design matrix from Cobb-Douglas inputs.
Translog is log y = alpha + sum_k beta_k * log x_k
+ 0.5 sum_k sum_l gamma_{kl} * log x_k * log x_l.
This helper takes input columns (already in log form) and returns a
DataFrame with the original columns plus squares x_k^2 / 2 and
cross-products x_k * x_l that can be fed straight to
:func:frontier / :func:xtfrontier as additional regressors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
inputs
|
list of str
|
Columns containing |
required |
include_interactions
|
bool
|
If True, adds |
True
|
include_squares
|
bool
|
If True, adds |
True
|
interaction_prefix
|
str
|
Optional prefix for the generated columns (e.g., |
""
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Original data + appended translog terms. Two lists are stored
on
|
Examples:
>>> df_tl = translog_design(df, inputs=["log_k", "log_l"])
>>> # Option A — one-liner, pass the full translog regressor list:
>>> terms = df_tl.attrs["translog_terms"]
>>> sp.frontier(df_tl, y="log_y", x=terms)
>>> # Option B — extend an existing x list without double-counting:
>>> base = ["log_k", "log_l"]
>>> sp.frontier(df_tl, y="log_y",
... x=base + df_tl.attrs["translog_added_terms"])