skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/svy/SKILL.md
Complex survey analysis: strata/PSU/weights, variance estimation (Taylor, BRR, jackknife, bootstrap), survey GLM, domain analysis, calibration. Polars-native. Use for NHANES, CPS, ACS PUMS, BRFSS, DHS. Non-survey regression: statsmodels/pyfixest.
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research svyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
svy: design-based analysis of complex survey data in Python. Covers survey design specification (strata, PSU, weights, FPC), variance estimation (Taylor linearization, BRR, jackknife, bootstrap), descriptive estimation (means, totals, proportions, ratios, medians), survey-weighted GLM regression (gaussian, binomial, Poisson), domain/subpopulation analysis, calibration, and survey data I/O (SAS, SPSS, Stata). Uses Polars DataFrames natively. Use when analyzing data from complex sample surveys (NHANES, CPS, ACS PUMS, MEPS, ECLS-K, BRFSS, DHS). For non-survey regression, use statsmodels; for fixed effects, use pyfixest; for panel/IV models, use linearmodels.
Comprehensive skill for complex survey data analysis with svy. Use decision trees below to find the right guidance, then load detailed references.
svy is the Python package for design-based analysis of complex survey data:
This skill targets svy 0.13.0 (released 2026-03-25). svy supersedes samplics (archived 2026-03-10), an earlier library by the same author (Mamadou S. Diallo, Ph.D.). Key differences from samplics:
Sample object replaces separate TaylorEstimator / ReplicateEstimator classessvy.io)| File | Purpose | When to Read |
|------|---------|--------------|
| estimation.md | Means, totals, proportions, ratios, medians, domain estimation, cross-tabs, hypothesis tests | Descriptive survey statistics |
| regression.md | Survey-weighted OLS, logistic, Poisson regression; extracting results; diagnostics | Survey regression models |
| design-weights.md | Design specification, replicate weights, weight manipulation, variance setup, survey data I/O, federal survey patterns | Setting up the survey design object |
design-weights.md then estimation.mddesign-weights.md then regression.mddesign-weights.md (replicate design section) then estimation.md or regression.mddesign-weights.md (federal survey patterns table)design-weights.md for the new API; the Sample object replaces TaylorEstimator/ReplicateEstimator| Skill | Relationship |
|-------|-------------|
| data-scientist | Provides methodology guidance (especially survey-analysis.md); svy provides implementation. Load data-scientist for "when and why" to use survey methods |
| statsmodels | Complement for non-survey regression (OLS, GLM, time series, diagnostics). WLS in statsmodels is NOT survey-weighted regression — it does not account for stratification or clustering |
| pyfixest | Complement for fixed effects models and DiD. pyfixest does not handle complex survey designs; use svy for survey-weighted estimation, pyfixest for FE/DiD |
| linearmodels | Complement for panel models (RE, FD, Fama-MacBeth) and IV/GMM. Does not handle survey designs |
| polars | svy uses Polars DataFrames natively. Load polars skill for data preparation before passing to svy |
What task?
├─ Descriptive statistics (mean, total, proportion)
│ └─ ./references/estimation.md
├─ Regression model
│ ├─ Linear (continuous outcome) → ./references/regression.md
│ ├─ Logistic (binary outcome) → ./references/regression.md
│ └─ Poisson (count outcome) → ./references/regression.md
├─ Set up the survey design object
│ └─ ./references/design-weights.md
├─ Read survey data from SAS/SPSS/Stata
│ └─ ./references/design-weights.md
├─ Subpopulation / domain analysis
│ └─ ./references/estimation.md
└─ Cross-tabulation
└─ ./references/estimation.md
What model?
├─ Linear regression (continuous Y)
│ └─ family="gaussian" → ./references/regression.md
├─ Logistic regression (binary Y)
│ └─ family="binomial" → ./references/regression.md
├─ Poisson regression (count Y)
│ └─ family="poisson" → ./references/regression.md
├─ Ordinal logistic / Cox survival / IV
│ └─ Not in svy — use rpy2 + R survey package (see rpy2 bridge below)
└─ Fixed effects + survey weights
└─ Not directly supported — see Boundaries below
What do you have?
├─ Design variables (strata, PSU, weights)
│ └─ Taylor linearization → ./references/design-weights.md
├─ Pre-computed replicate weights
│ ├─ BRR weights → ./references/design-weights.md
│ ├─ Jackknife weights → ./references/design-weights.md
│ └─ Bootstrap weights → ./references/design-weights.md
├─ Need to create replicate weights from design
│ └─ ./references/design-weights.md
└─ Not sure what I have
└─ Read survey documentation first → ./references/design-weights.md (federal survey table)
What statistic?
├─ Population mean → ./references/estimation.md
├─ Population total → ./references/estimation.md
├─ Proportion → ./references/estimation.md
├─ Ratio (Y/X) → ./references/estimation.md
├─ Median / quantile → ./references/estimation.md
├─ Cross-tabulation → ./references/estimation.md
├─ By subgroup (domain estimation) → ./references/estimation.md
└─ Hypothesis test (t-test) → ./references/estimation.md
svy covers:
svy does NOT cover (use other tools):
For models svy does not support (ordinal logistic, survival models, negative binomial GLM, cumulative link models), fall back to R's survey package via rpy2:
Decision rule: If the model family is not "gaussian", "binomial", or "poisson", use rpy2.
The R survey package (survey::svyglm, survey::svyolr, survey::svycoxph) covers the full range of survey-weighted models. Set up the survey design in R using the same design variables you would pass to svy.Design. See R survey package documentation at r-survey.r-forge.r-project.org for API details.
samplics (2020-2026) is archived. svy supersedes it with a cleaner API, Polars integration, and expanded methods. If working with legacy code that uses samplics:
TaylorEstimator/ReplicateEstimator classes are replaced by svy.Samplesamplics-org.github.io/samplics/ for legacy referenceImportant: In data research pipelines (see CLAUDE.md), svy analyses are executed through script files, not interactively. This ensures auditability and reproducibility.
The pattern:
scripts/stage8_analysis/{step}_{task-name}.pyClosely read agent_reference/SCRIPT_EXECUTION_REFERENCE.md for the mandatory file-first execution protocol. All survey analysis scripts must follow the Inline Audit Trail (IAT) standard — document design specification choices (why these strata/PSU/weights, what variance method, domain definitions) with # INTENT:, # REASONING:, and # ASSUMES: comments.
import svy
# 1. Load data
data = svy.io.read_stata("nhanes.dta")
# 2. Specify design
design = svy.Design(stratum="sdmvstra", psu="sdmvpsu", wgt="wtmec2yr")
# 3. Create sample object
sample = svy.Sample(data=data, design=design)
# 4. Estimate
mean_bmi = sample.estimation.mean("bmxbmi")
model = sample.glm.fit(y="bmxbmi", x=["ridageyr", svy.Cat("riagendr")], family="gaussian")
| Operation | Code |
|-----------|------|
| Design (Taylor) | svy.Design(stratum="s", psu="p", wgt="w") |
| Sample object | svy.Sample(data=df, design=design) |
| Mean | sample.estimation.mean("var") |
| Total | sample.estimation.total("var") |
| Proportion | sample.estimation.prop("var") |
| Ratio | sample.estimation.ratio(y="num", x="denom") |
| Median | sample.estimation.median("var") |
| Domain estimation | sample.estimation.mean("var", by="group") |
| Linear regression | sample.glm.fit(y="y", x=[...], family="gaussian") |
| Logistic regression | sample.glm.fit(y="y", x=[...], family="binomial") |
| Poisson regression | sample.glm.fit(y="y", x=[...], family="poisson") |
| Categorical predictor | svy.Cat("varname") |
| Read Stata | svy.io.read_stata("file.dta") |
| Read SAS | svy.io.read_sas("file.sas7bdat") |
| Read SPSS | svy.io.read_spss("file.sav") |
| Topic | Reference File |
|-------|---------------|
| Survey design setup | ./references/design-weights.md |
| Taylor linearization | ./references/design-weights.md |
| Replicate weights (BRR, jackknife, bootstrap) | ./references/design-weights.md |
| Fay's BRR modification | ./references/design-weights.md |
| Weight types and handling | ./references/design-weights.md |
| Federal survey design patterns | ./references/design-weights.md |
| Singleton PSU handling | ./references/design-weights.md |
| Calibration and post-stratification | ./references/design-weights.md |
| Reading SAS/SPSS/Stata files | ./references/design-weights.md |
| Population means | ./references/estimation.md |
| Population totals | ./references/estimation.md |
| Proportions | ./references/estimation.md |
| Ratios | ./references/estimation.md |
| Medians and quantiles | ./references/estimation.md |
| Domain / subpopulation estimation | ./references/estimation.md |
| Cross-tabulations | ./references/estimation.md |
| Survey-weighted t-tests | ./references/estimation.md |
| Design effects (DEFF) | ./references/estimation.md |
| Survey-weighted OLS | ./references/regression.md |
| Survey-weighted logistic regression | ./references/regression.md |
| Survey-weighted Poisson regression | ./references/regression.md |
| Extracting regression results | ./references/regression.md |
| Survey regression vs. WLS vs. cluster-robust | ./references/regression.md |
| Categorical predictors (svy.Cat) | ./references/regression.md |
| Model diagnostics in survey context | ./references/regression.md |
| rpy2 bridge to R survey package | ./references/regression.md |
| samplics migration | ./references/design-weights.md |
| Polars DataFrame integration | ./references/design-weights.md |
When this library is used as a primary analytical tool, include in the report's Software & Tools references:
Diallo, M.S. svy: Python package for complex survey sampling and analysis [Computer software]. (Formerly samplics.)
Cite when: svy is used for survey-weighted estimation with complex survey designs (strata, PSU, replicate weights). Do not cite when: Only imported but no survey estimation performed.
For method-specific citations (e.g., variance estimation techniques),
consult the reference files in this skill and agent_reference/CITATION_REFERENCE.md.
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.