skills/43-wentorai-research-plugins/skills/analysis/statistics/survival-analysis-guide/SKILL.md
Conduct Kaplan-Meier, Cox regression, and time-to-event analyses
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research survival-analysis-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A skill for conducting time-to-event analyses including Kaplan-Meier estimation, log-rank tests, and Cox proportional hazards regression. Covers censoring concepts, assumption checking, and reporting standards for clinical and social science research.
Survival analysis studies the time until an event of interest occurs. Despite the name, the "event" need not be death -- it can be any well-defined transition:
Medical: Time to disease recurrence, death, or recovery
Engineering: Time to equipment failure
Social: Time to job termination, divorce, or graduation
Business: Time to customer churn or first purchase
Ecology: Time to species extinction in a habitat
Right censoring (most common):
The event has not occurred by the end of the study period.
Example: Patient is still alive at study end.
The survival time is "at least T" -- we know T but not the true event time.
Left censoring:
The event occurred before the observation period began.
Example: HIV infection detected, but seroconversion happened before testing.
Interval censoring:
The event occurred between two observation times.
Example: A patient tests negative at visit 3 and positive at visit 4.
import numpy as np
def kaplan_meier(times: list[float], events: list[int]) -> dict:
"""
Compute Kaplan-Meier survival estimates.
Args:
times: Observed times (event or censoring time)
events: Event indicator (1 = event occurred, 0 = censored)
Returns:
Dict with time points and survival probabilities
"""
data = sorted(zip(times, events), key=lambda x: x[0])
n = len(data)
unique_event_times = sorted(set(t for t, e in data if e == 1))
survival = 1.0
results = {"time": [0], "survival": [1.0]}
at_risk = n
idx = 0
for t_event in unique_event_times:
# Count censored before this event time
while idx < n and data[idx][0] < t_event:
if data[idx][1] == 0:
at_risk -= 1
idx += 1
# Count events at this time
d = sum(1 for t, e in data if t == t_event and e == 1)
c = sum(1 for t, e in data if t == t_event and e == 0)
survival *= (at_risk - d) / at_risk
results["time"].append(t_event)
results["survival"].append(survival)
at_risk -= (d + c)
idx = max(idx, sum(1 for t, _ in data if t <= t_event))
return results
from lifelines import KaplanMeierFitter
kmf = KaplanMeierFitter()
kmf.fit(durations=time_column, event_observed=event_column, label="Overall")
# Plot the survival curve
kmf.plot_survival_function()
# Median survival time
print(f"Median survival: {kmf.median_survival_time_}")
# Survival probability at specific time
print(f"5-year survival: {kmf.predict(5.0):.3f}")
from lifelines.statistics import logrank_test
results = logrank_test(
durations_A=group_a_times,
durations_B=group_b_times,
event_observed_A=group_a_events,
event_observed_B=group_b_events
)
print(f"Test statistic: {results.test_statistic:.3f}")
print(f"p-value: {results.p_value:.4f}")
The log-rank test is the standard method for comparing two or more survival curves. It tests the null hypothesis that the survival functions are identical. It is most powerful when hazards are proportional (consistent relative risk over time).
from lifelines import CoxPHFitter
import pandas as pd
cph = CoxPHFitter()
cph.fit(
df,
duration_col="time",
event_col="event",
formula="age + treatment + stage"
)
cph.print_summary()
# Hazard ratios
print(cph.summary[["exp(coef)", "exp(coef) lower 95%", "exp(coef) upper 95%", "p"]])
Hazard Ratio (HR) = exp(coefficient)
HR = 1.0 No effect
HR > 1.0 Increased hazard (worse survival)
HR < 1.0 Decreased hazard (better survival)
Example output:
treatment: HR = 0.65, 95% CI [0.48, 0.88], p = 0.005
Interpretation: Treatment group has 35% lower hazard of the event
compared to the control group.
# Schoenfeld residuals test
cph.check_assumptions(df, p_value_threshold=0.05, show_plots=True)
If the proportional hazards assumption is violated, consider: stratified Cox models, time-varying covariates, or accelerated failure time (AFT) models as alternatives.
1. Report number of events and total person-time at risk
2. Present Kaplan-Meier curves with number-at-risk tables
3. Report median survival with 95% confidence intervals
4. Report hazard ratios with 95% CIs and p-values
5. State which covariates were included in adjusted models
6. Report proportional hazards assumption test results
7. Specify the handling of tied event times (Efron, Breslow)
8. Note any competing risks and how they were handled
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.