Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

brycewang-stanford/survey-data-processing

Name: survey-data-processing
Author: brycewang-stanford

skills/43-wentorai-research-plugins/skills/analysis/wrangling/survey-data-processing/SKILL.md

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research survey-data-processing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Survey Data Processing

A skill for cleaning, recoding, and preparing survey response data for statistical analysis. Covers handling common survey data issues such as incomplete responses, attention check failures, reverse-coded items, scale construction, open-ended response coding, and export to analysis-ready formats compatible with SPSS, Stata, and R.

Survey Data Quality Assessment

Initial Inspection Workflow

Survey data from platforms like Qualtrics, SurveyMonkey, REDCap, and Google Forms each have their own export formats and quirks. The first step is always standardization.

import pandas as pd
import numpy as np

def assess_survey_quality(df, duration_col="duration_seconds",
                          min_duration=60):
    """
    Generate a survey data quality report.

    Checks:
    - Completion rates per question
    - Response duration (speeders and slow responders)
    - Straight-line responding patterns
    - Attention check failures
    """
    report = {}

    # Overall completion
    total_respondents = len(df)
    complete = df.dropna(thresh=int(len(df.columns) * 0.8))
    report["total_responses"] = total_respondents
    report["substantially_complete"] = len(complete)
    report["completion_rate"] = f"{len(complete)/total_respondents*100:.1f}%"

    # Duration analysis
    if duration_col in df.columns:
        durations = df[duration_col].dropna()
        report["median_duration_seconds"] = durations.median()
        report["speeders"] = (durations < min_duration).sum()
        report["speeder_pct"] = f"{(durations < min_duration).mean()*100:.1f}%"

    # Missing data per question
    missing_by_col = df.isna().sum().sort_values(ascending=False)
    report["most_skipped_questions"] = missing_by_col.head(10).to_dict()

    return report

Identifying Low-Quality Responses

def detect_straightlining(df, likert_columns, threshold=0.9):
    """
    Detect respondents who select the same answer for nearly
    all Likert-scale questions (straight-line responding).

    A respondent is flagged if the proportion of their most
    common response exceeds the threshold.
    """
    flagged = []
    for idx, row in df[likert_columns].iterrows():
        responses = row.dropna()
        if len(responses) == 0:
            continue
        most_common_pct = responses.value_counts().iloc[0] / len(responses)
        if most_common_pct >= threshold:
            flagged.append(idx)

    return flagged


def check_attention_items(df, attention_checks):
    """
    Validate attention check (trap) questions.

    Args:
        attention_checks: dict of {column_name: correct_answer}
        Example: {"q15_attention": 4, "q32_trap": "strongly agree"}
    """
    failed = pd.Series(False, index=df.index)
    for col, correct in attention_checks.items():
        failed = failed | (df[col] != correct)

    return df.index[failed].tolist()

Recoding and Transformation

Reverse Coding

Many validated psychological scales include reverse-coded items to detect acquiescence bias. These must be recoded before computing scale scores.

def reverse_code(df, columns, scale_max, scale_min=1):
    """
    Reverse-code specified columns for Likert-type scales.

    Formula: reversed = (scale_max + scale_min) - original

    Example for a 1-5 scale:
      1 -> 5, 2 -> 4, 3 -> 3, 4 -> 2, 5 -> 1
    """
    df_recoded = df.copy()
    for col in columns:
        df_recoded[col] = (scale_max + scale_min) - df[col]
    return df_recoded


# Example usage with a Big Five personality scale
reverse_items = {
    "extraversion": ["ext_2", "ext_4", "ext_6"],
    "neuroticism": ["neur_1", "neur_3", "neur_5"],
    "agreeableness": ["agree_3", "agree_5"],
}

# For a 1-7 Likert scale:
for construct, items in reverse_items.items():
    df = reverse_code(df, items, scale_max=7, scale_min=1)

Scale Construction

def compute_scale_scores(df, scale_definitions, method="mean"):
    """
    Compute composite scale scores from individual items.

    Args:
        scale_definitions: dict mapping scale name to list of columns
        method: "mean" or "sum"

    Returns:
        DataFrame with new scale score columns
    """
    for scale_name, items in scale_definitions.items():
        if method == "mean":
            df[scale_name] = df[items].mean(axis=1)
        elif method == "sum":
            df[scale_name] = df[items].sum(axis=1)

        # Also compute Cronbach's alpha for reliability
        alpha = cronbachs_alpha(df[items])
        print(f"{scale_name}: alpha = {alpha:.3f} "
              f"(n_items = {len(items)})")

    return df


def cronbachs_alpha(item_df):
    """
    Compute Cronbach's alpha for internal consistency reliability.
    Values above 0.70 are generally considered acceptable.
    """
    item_df = item_df.dropna()
    n_items = item_df.shape[1]
    if n_items < 2:
        return np.nan

    item_variances = item_df.var(axis=0, ddof=1)
    total_variance = item_df.sum(axis=1).var(ddof=1)

    alpha = (n_items / (n_items - 1)) * (
        1 - item_variances.sum() / total_variance
    )
    return alpha

Open-Ended Response Processing

Coding Qualitative Responses

def code_open_responses(df, text_column, codebook):
    """
    Apply a predefined codebook to open-ended responses using
    keyword matching. For research-quality coding, this should
    be supplemented with manual coding by trained raters.

    Args:
        codebook: dict mapping code names to keyword lists
        Example: {
            "financial_concern": ["money", "cost", "expensive", "afford"],
            "time_constraint": ["time", "busy", "schedule", "hours"],
            "quality_issue": ["quality", "broken", "defect", "poor"],
        }
    """
    for code_name, keywords in codebook.items():
        pattern = "|".join(keywords)
        df[f"code_{code_name}"] = (
            df[text_column]
            .str.lower()
            .str.contains(pattern, na=False)
            .astype(int)
        )

    return df

Inter-Rater Reliability

When multiple coders classify open-ended responses:

Cohen's Kappa (2 raters):
  - < 0.20: poor agreement
  - 0.21-0.40: fair
  - 0.41-0.60: moderate
  - 0.61-0.80: substantial
  - 0.81-1.00: almost perfect

Fleiss' Kappa (3+ raters):
  - Same interpretation scale as Cohen's
  - Use when more than two raters code the same responses

Process:
  1. Develop codebook with definitions and examples
  2. Train coders on 10-20 practice responses
  3. Code 20% of responses independently (overlap set)
  4. Calculate inter-rater reliability on the overlap set
  5. If kappa < 0.70, discuss disagreements and refine codebook
  6. Repeat until acceptable reliability is achieved
  7. Divide remaining responses among coders

Data Reshaping for Analysis

Wide to Long Format

Survey data is typically exported in wide format (one row per respondent, one column per question). Many analyses require long format.

def reshape_repeated_measures(df, id_col, time_points,
                              measure_prefix):
    """
    Reshape repeated-measures survey data from wide to long.

    Example: columns q1_pre, q1_post -> long format with
    time column ("pre", "post") and value column.
    """
    value_vars = [f"{measure_prefix}_{t}" for t in time_points]

    long_df = pd.melt(
        df,
        id_vars=[id_col],
        value_vars=value_vars,
        var_name="time_point",
        value_name=measure_prefix
    )

    # Clean time_point column
    long_df["time_point"] = (
        long_df["time_point"]
        .str.replace(f"{measure_prefix}_", "")
    )

    return long_df

Export for Statistical Software

Export formats by software:

SPSS (.sav):
  - Use pyreadstat: pyreadstat.write_sav(df, "output.sav")
  - Include variable labels and value labels
  - Set measurement level (nominal, ordinal, scale)

Stata (.dta):
  - Use pandas: df.to_stata("output.dta")
  - Include variable labels via write_stata with labels dict

R (.csv with codebook):
  - Export CSV plus a separate codebook document
  - Or use pyreadstat to write .rds format
  - Include factor level definitions

General best practices:
  - Include a unique respondent ID column
  - Use numeric codes for categorical variables (with labels)
  - Document all recoding in a companion codebook
  - Save both raw and processed versions
  - Include a timestamp column for data versioning

Proper survey data processing is essential for valid statistical inference. Decisions made during cleaning and recoding directly affect research conclusions, making transparent documentation of every step a methodological requirement rather than a convenience.

brycewang-stanford/survey-data-processing

skills/43-wentorai-research-plugins/skills/analysis/wrangling/survey-data-processing/SKILL.md

Clean, recode, and prepare survey response data for analysis

1,232 stars

development

Updated May 26, 2026

$ install --global

skillsauth

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research survey-data-processing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 26, 2026, 4:48 AM31.0s1 file scanned

SKILL.md

name:: survey-data-processing
description:: Clean, recode, and prepare survey response data for analysis
emoji:: 📋
category:: analysis
subcategory:: wrangling
keywords:: ["survey data", "questionnaire coding", "Likert scale", "response validation", "recoding", "survey analysis"]
source:: wentor-research-plugins

Survey Data Processing

Survey Data Quality Assessment

Initial Inspection Workflow

Survey data from platforms like Qualtrics, SurveyMonkey, REDCap, and Google Forms each have their own export formats and quirks. The first step is always standardization.

import pandas as pd
import numpy as np

def assess_survey_quality(df, duration_col="duration_seconds",
                          min_duration=60):
    """
    Generate a survey data quality report.

    Checks:
    - Completion rates per question
    - Response duration (speeders and slow responders)
    - Straight-line responding patterns
    - Attention check failures
    """
    report = {}

    # Overall completion
    total_respondents = len(df)
    complete = df.dropna(thresh=int(len(df.columns) * 0.8))
    report["total_responses"] = total_respondents
    report["substantially_complete"] = len(complete)
    report["completion_rate"] = f"{len(complete)/total_respondents*100:.1f}%"

    # Duration analysis
    if duration_col in df.columns:
        durations = df[duration_col].dropna()
        report["median_duration_seconds"] = durations.median()
        report["speeders"] = (durations < min_duration).sum()
        report["speeder_pct"] = f"{(durations < min_duration).mean()*100:.1f}%"

    # Missing data per question
    missing_by_col = df.isna().sum().sort_values(ascending=False)
    report["most_skipped_questions"] = missing_by_col.head(10).to_dict()

    return report

Identifying Low-Quality Responses

def detect_straightlining(df, likert_columns, threshold=0.9):
    """
    Detect respondents who select the same answer for nearly
    all Likert-scale questions (straight-line responding).

    A respondent is flagged if the proportion of their most
    common response exceeds the threshold.
    """
    flagged = []
    for idx, row in df[likert_columns].iterrows():
        responses = row.dropna()
        if len(responses) == 0:
            continue
        most_common_pct = responses.value_counts().iloc[0] / len(responses)
        if most_common_pct >= threshold:
            flagged.append(idx)

    return flagged


def check_attention_items(df, attention_checks):
    """
    Validate attention check (trap) questions.

    Args:
        attention_checks: dict of {column_name: correct_answer}
        Example: {"q15_attention": 4, "q32_trap": "strongly agree"}
    """
    failed = pd.Series(False, index=df.index)
    for col, correct in attention_checks.items():
        failed = failed | (df[col] != correct)

    return df.index[failed].tolist()

Recoding and Transformation

Reverse Coding

Many validated psychological scales include reverse-coded items to detect acquiescence bias. These must be recoded before computing scale scores.

def reverse_code(df, columns, scale_max, scale_min=1):
    """
    Reverse-code specified columns for Likert-type scales.

    Formula: reversed = (scale_max + scale_min) - original

    Example for a 1-5 scale:
      1 -> 5, 2 -> 4, 3 -> 3, 4 -> 2, 5 -> 1
    """
    df_recoded = df.copy()
    for col in columns:
        df_recoded[col] = (scale_max + scale_min) - df[col]
    return df_recoded


# Example usage with a Big Five personality scale
reverse_items = {
    "extraversion": ["ext_2", "ext_4", "ext_6"],
    "neuroticism": ["neur_1", "neur_3", "neur_5"],
    "agreeableness": ["agree_3", "agree_5"],
}

# For a 1-7 Likert scale:
for construct, items in reverse_items.items():
    df = reverse_code(df, items, scale_max=7, scale_min=1)

Scale Construction

def compute_scale_scores(df, scale_definitions, method="mean"):
    """
    Compute composite scale scores from individual items.

    Args:
        scale_definitions: dict mapping scale name to list of columns
        method: "mean" or "sum"

    Returns:
        DataFrame with new scale score columns
    """
    for scale_name, items in scale_definitions.items():
        if method == "mean":
            df[scale_name] = df[items].mean(axis=1)
        elif method == "sum":
            df[scale_name] = df[items].sum(axis=1)

        # Also compute Cronbach's alpha for reliability
        alpha = cronbachs_alpha(df[items])
        print(f"{scale_name}: alpha = {alpha:.3f} "
              f"(n_items = {len(items)})")

    return df


def cronbachs_alpha(item_df):
    """
    Compute Cronbach's alpha for internal consistency reliability.
    Values above 0.70 are generally considered acceptable.
    """
    item_df = item_df.dropna()
    n_items = item_df.shape[1]
    if n_items < 2:
        return np.nan

    item_variances = item_df.var(axis=0, ddof=1)
    total_variance = item_df.sum(axis=1).var(ddof=1)

    alpha = (n_items / (n_items - 1)) * (
        1 - item_variances.sum() / total_variance
    )
    return alpha

Open-Ended Response Processing

Coding Qualitative Responses

def code_open_responses(df, text_column, codebook):
    """
    Apply a predefined codebook to open-ended responses using
    keyword matching. For research-quality coding, this should
    be supplemented with manual coding by trained raters.

    Args:
        codebook: dict mapping code names to keyword lists
        Example: {
            "financial_concern": ["money", "cost", "expensive", "afford"],
            "time_constraint": ["time", "busy", "schedule", "hours"],
            "quality_issue": ["quality", "broken", "defect", "poor"],
        }
    """
    for code_name, keywords in codebook.items():
        pattern = "|".join(keywords)
        df[f"code_{code_name}"] = (
            df[text_column]
            .str.lower()
            .str.contains(pattern, na=False)
            .astype(int)
        )

    return df

Inter-Rater Reliability

When multiple coders classify open-ended responses:

Cohen's Kappa (2 raters):
  - < 0.20: poor agreement
  - 0.21-0.40: fair
  - 0.41-0.60: moderate
  - 0.61-0.80: substantial
  - 0.81-1.00: almost perfect

Fleiss' Kappa (3+ raters):
  - Same interpretation scale as Cohen's
  - Use when more than two raters code the same responses

Process:
  1. Develop codebook with definitions and examples
  2. Train coders on 10-20 practice responses
  3. Code 20% of responses independently (overlap set)
  4. Calculate inter-rater reliability on the overlap set
  5. If kappa < 0.70, discuss disagreements and refine codebook
  6. Repeat until acceptable reliability is achieved
  7. Divide remaining responses among coders

Data Reshaping for Analysis

Wide to Long Format

Survey data is typically exported in wide format (one row per respondent, one column per question). Many analyses require long format.

def reshape_repeated_measures(df, id_col, time_points,
                              measure_prefix):
    """
    Reshape repeated-measures survey data from wide to long.

    Example: columns q1_pre, q1_post -> long format with
    time column ("pre", "post") and value column.
    """
    value_vars = [f"{measure_prefix}_{t}" for t in time_points]

    long_df = pd.melt(
        df,
        id_vars=[id_col],
        value_vars=value_vars,
        var_name="time_point",
        value_name=measure_prefix
    )

    # Clean time_point column
    long_df["time_point"] = (
        long_df["time_point"]
        .str.replace(f"{measure_prefix}_", "")
    )

    return long_df

Export for Statistical Software

Export formats by software:

SPSS (.sav):
  - Use pyreadstat: pyreadstat.write_sav(df, "output.sav")
  - Include variable labels and value labels
  - Set measurement level (nominal, ordinal, scale)

Stata (.dta):
  - Use pandas: df.to_stata("output.dta")
  - Include variable labels via write_stata with labels dict

R (.csv with codebook):
  - Export CSV plus a separate codebook document
  - Or use pyreadstat to write .rds format
  - Include factor level definitions

General best practices:
  - Include a unique respondent ID column
  - Use numeric codes for categorical variables (with labels)
  - Document all recoding in a companion codebook
  - Save both raw and processed versions
  - Include a timestamp column for data versioning

Related Skills

brycewang-stanford/literature-review-tools

tools

VerifiedTrustedCommunity

Recommend AND run open-source AI tools, agents, Claude Code / Codex skills, and MCP servers for any stage of a literature review — searching, reading, extracting, synthesizing, screening, citation-checking, and paper writing. Use when the user asks "what tool should I use to..." OR "install/run/use <tool> to ..." for research/lit-review work: automating a survey or related-work section, PDF→Markdown extraction for LLMs (MinerU/marker/docling), PRISMA / systematic review (ASReview), citation-backed Q&A over PDFs (PaperQA2), wiring papers into Claude/Cursor via MCP (arxiv/paper-search/zotero servers), or chatting with a Zotero library. Ships a launcher (scripts/litrun.py) that installs each tool in an isolated venv and runs it. Curated catalog of 70+ vetted projects. 支持中英文（用于「文献综述工具选型」与「一键安装/运行」）。

3,109SKILL.mdUpdated Jul 28, 2026

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

development

VerifiedTrustedCommunity

Route empirical-research requests through the Auto-Empirical Research Skills catalog when this whole repository is installed as one skill in Codex, CodeBuddy, Claude Code, or another IDE. Use to choose and load the right vendored AERS skill for causal inference, econometrics, replication, data acquisition, manuscript writing, peer review and referee responses, citation checking, de-AIGC editing, or full empirical-paper workflows without reading the entire repository at once.

3,109SKILL.mdUpdated Jun 27, 2026

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

documentation

VerifiedTrustedCommunity

Use when the project collects primary data or runs a field, lab, or survey experiment, before the intervention begins — write the pre-analysis plan, size the sample from a power calculation, and register with the AEA RCT Registry. Apply after the design is chosen in aer-identification and before any outcome data are seen.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

tools

VerifiedTrustedCommunity

Guide economists to authoritative data sources with explicit, confirmed data specifications before retrieval; interfaces with Playwright MCP to navigate portals and extract real data, not articles about data.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/economist-data-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

# Copy into Claude Code skills folder (global)
cp -r Awesome-Agent-Skills-for-Empirical-Research/skills/43-wentorai-research-plugins/skills/analysis/wrangling/survey-data-processing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

1,232 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT