statspai.survey¶
survey ¶
Survey design and weighted estimation — StatsPAI's answer to R's survey
package and Stata's svy: prefix.
Supports stratified, clustered, and weighted survey designs with design-corrected standard errors for means, totals, and regression.
import statspai as sp design = sp.svydesign(data=df, weights='pw', strata='stratum', ... cluster='psu') design.mean('income') design.total('income') design.glm('income ~ education + age')
SurveyDesign ¶
Declare a complex survey design.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Survey microdata. |
required |
weights
|
str or array - like
|
Sampling weights (inverse probability). If str, column name in data. |
required |
strata
|
str or None
|
Stratification variable (column name). |
None
|
cluster
|
str or None
|
Primary sampling unit (PSU) variable (column name). |
None
|
fpc
|
str or None
|
Finite population correction — column of stratum population sizes or sampling fractions. If values are < 1 they are treated as fractions; otherwise as population counts. |
None
|
nest
|
bool
|
If True, PSU ids are nested within strata (re-label internally). |
False
|
svydesign ¶
svydesign(data: DataFrame, weights: Union[str, ndarray], strata: Optional[str] = None, cluster: Optional[str] = None, fpc: Optional[str] = None, nest: bool = False) -> SurveyDesign
svymean ¶
svymean(variables: Union[str, List[str]], design: 'SurveyDesign', alpha: float = 0.05) -> SurveyResult
Survey-weighted mean with design-corrected standard errors.
Uses Taylor-series linearisation identical to R survey::svymean.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variables
|
str or list of str
|
Column name(s) in the design's data. |
required |
design
|
SurveyDesign
|
|
required |
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
SurveyResult
|
|
svytotal ¶
svytotal(variables: Union[str, List[str]], design: 'SurveyDesign', alpha: float = 0.05) -> SurveyResult
Survey-weighted total with design-corrected standard errors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variables
|
str or list of str
|
|
required |
design
|
SurveyDesign
|
|
required |
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
SurveyResult
|
|
svyglm ¶
Survey-weighted generalised linear model.
Fits WLS (for gaussian family) or weighted IRLS (for binomial/poisson) and computes design-corrected standard errors via the sandwich estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
|
required |
design
|
SurveyDesign
|
|
required |
family
|
str
|
|
'gaussian'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
SurveyResult with regression coefficient estimates.
|
|
rake ¶
rake(data: DataFrame, margins: Dict[str, Dict], weight: Optional[str] = None, max_iter: int = 100, tol: float = 1e-06) -> CalibrationResult
Raking (iterative proportional fitting).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
margins
|
dict
|
|
required |
weight
|
str
|
Existing design weight column. If |
None
|
linear_calibration ¶
linear_calibration(data: DataFrame, totals: Dict[str, float], weight: Optional[str] = None) -> CalibrationResult
Deville-Särndal (1992) linear calibration.
Find calibrated weights g_i * d_i minimising
Σ (g_i - 1)² / d_i subject to Σ g_i d_i x_{ik} = T_k
for each auxiliary variable k with known total T_k.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
totals
|
dict
|
|
required |
weight
|
str
|
Design weight column. If |
None
|