research-ops/skills/clinical-research/SKILL.md
Use when designing a prospective clinical study before submission — selecting and classifying endpoints (primary / key-secondary / exploratory, with surrogate-endpoint flagging), estimating sample size and power for two-arm designs (means / proportions / survival), or scoring a study plan for feasibility and a GO / GO-WITH-CONDITIONS / REDESIGN / NO-GO phase-gate decision. Every output is an ESTIMATE plus a named human owner (clinician / biostatistician / regulatory owner) — never clinical fact, never a finished protocol. Distinct from ra-qm-team, which handles the regulatory/QM submission (ISO 13485, EU MDR, FDA 510(k)/PMA/QSR), not the study design.
npx skillsauth add alirezarezvani/claude-skills clinical-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Prospective clinical study DESIGN: endpoints, sample size / power, and phase-gate feasibility. Every output is an estimate with stated assumptions routed to a named human owner. This skill never gives clinical advice as fact and never substitutes for a biostatistician or regulatory affairs.
R&D clinical teams, medical monitors, and biostatistics functions live at the moment between we-have-a-hypothesis and we-have-a-protocol-ready-for-submission. This skill structures three of the hardest design decisions:
Three deterministic tools:
sample_size_estimator.py — Closed-form power / sample-size for two-arm means (Cohen's d), proportions (normal approximation), and survival (Schoenfeld events). Inflates for dropout. Prints an "ESTIMATE — confirm with a biostatistician" banner.endpoint_selector.py — Scores candidate endpoints across 5 weighted dimensions (clinical relevance, measurability, regulatory acceptance, sensitivity-to-change, burden) and classifies each as PRIMARY / KEY-SECONDARY / EXPLORATORY. Penalizes unvalidated surrogate endpoints.phase_gate_scorer.py — Scores a study plan 0-100 across recruitment feasibility, endpoint readiness, statistical power, operational complexity, and budget fit; returns GO / GO-WITH-CONDITIONS / REDESIGN / NO-GO plus the named owners who must sign.Invoke this skill when:
Do NOT use this skill to: prepare a regulatory submission or clinical evaluation report (use ra-qm-team), find or position a grant (use research/grants), design a live product A/B experiment (use product-team/experiment-designer), or replace a biostatistician's final sample-size justification.
assets/protocol_synopsis_template.md (objectives, design, population, endpoints, statistical plan placeholder, owners-to-sign).endpoint_selector.py --input endpoints.json --profile {drug|device|biologic|diagnostic|digital-therapeutic}. Read the classification + surrogate flags. If >1 primary, plan multiplicity control.sample_size_estimator.py --design {means|proportions|survival} .... Trace the effect/difference/HR to a published or anchor-based source; inflate for dropout.phase_gate_scorer.py --input study.json --profile <same> --phase {1|2|3|4}. Read the verdict + blockers + named owners.| Script | Purpose | Profiles |
|---|---|---|
| scripts/sample_size_estimator.py | Power / sample-size for means, proportions, survival | n/a (design-driven) |
| scripts/endpoint_selector.py | 5-dimension endpoint scoring + classification + surrogate flag | drug, device, biologic, diagnostic, digital-therapeutic |
| scripts/phase_gate_scorer.py | Feasibility 0-100 + GO/GO-WITH-CONDITIONS/REDESIGN/NO-GO + owners | drug, device, biologic, diagnostic, digital-therapeutic |
All three: stdlib-only, --help, --sample, --output {human,json}.
Run the onboarding questionnaire once before you start — it captures your defaults and named owners so every tool in this skill is pre-configured. Customization is the point: the answers actually change tool behavior.
python3 scripts/onboard.py # interactive (also: --defaults, --set key=value, --reset)
python3 scripts/onboard.py --show # see the questions + current effective config
Answers are saved to ~/.config/research-ops/clinical-research.json (global) or ./.research-ops/clinical-research.json (--scope project) and are read automatically by config_loader.py. They set the default development-area profile, default alpha / power / dropout, and the named biostatistician / medical monitor / regulatory owner printed on outputs. CLI flags always override saved config; RESEARCH_OPS_NO_CONFIG=1 ignores it entirely.
The seven questions: development area · alpha · power · dropout · biostatistician · medical monitor · regulatory owner.
This skill ships an isolated, opt-in bridge to engineering/autoresearch-agent. Only when you ask to "optimize" / "run a loop" does an autoresearch experiment iteratively improve a study plan against this skill's own feasibility score. scripts/ar_evaluator.py is the ground-truth evaluator; it prints feasibility_composite: <0-100> (higher is better).
/ar:setup --domain custom --name trial-feasibility \
--target study.json \
--eval "python3 ar_evaluator.py --target study.json" \
--metric feasibility_composite --direction higher
/ar:loop custom/trial-feasibility
Isolated: no hard dependency — autoresearch runs only on demand, and the loop edits study.json, never the evaluator (locked ground truth).
references/study_design_canon.md — ICH E8(R1) general considerations; ICH E9 + E9(R1) estimand addendum; CONSORT 2010; SPIRIT 2013; FDA Multiple Endpoints guidance (2022).references/endpoint_and_power.md — Cohen Statistical Power Analysis; Schoenfeld (1983) survival sample size; FDA Surrogate Endpoint Table / BEST glossary; FDA PRO guidance (2009); Chow, Shao & Wang Sample Size Calculations in Clinical Research.references/trial_operations.md — ICH E6(R2/R3) GCP; TransCelerate risk-based monitoring; FDA RBM guidance; CTTI recruitment best practices; site-feasibility scoring literature.--profile. Company- or indication-specific precedent overrides the prior.| Sibling / neighbor | Scope | Difference |
|---|---|---|
| ra-qm-team | ISO 13485 QMS, ISO 14971 risk, EU MDR tech docs + clinical evaluation, FDA 510(k)/PMA/De Novo/QSR submission | That is the submission; clinical-research designs the study beforehand |
| research/grants | NIH funding discovery + positioning | That finds funding; this designs the trial |
| product-team/experiment-designer | Live product A/B hypothesis + sample size | That is a product experiment; this is a clinical trial |
| research-finance (sibling) | R&D program budget + burn | That funds the program; this scopes the study |
python3 scripts/sample_size_estimator.py --sample
python3 scripts/sample_size_estimator.py --design proportions --p1 0.30 --p2 0.45 --dropout 0.15
python3 scripts/endpoint_selector.py --sample
python3 scripts/phase_gate_scorer.py --sample --output json
The sample correctly flags an unvalidated serum-cytokine surrogate (cannot be primary) and ranks PASI-75 as the PRIMARY endpoint; the phase-gate sample returns a verdict with a named owner chain.
Walked one at a time by /cs:grill-research-ops or the orchestrator. Recommended answer + canon citation per question. Never bundled.
"Is your primary endpoint a clinical outcome or a surrogate — and if surrogate, is it on FDA's validated table?" Recommended: clinical outcome unless the surrogate is validated for this indication. Canon: FDA Surrogate Endpoint Table; BEST (Biomarkers, EndpointS, and other Tools) glossary.
"What's the minimal clinically important difference you're powering for — and where did that number come from?" Recommended: a published or anchor-based MCID, cited; never a convenience effect size. Canon: ICH E9; Cohen Statistical Power Analysis.
"What dropout rate are you assuming, and is the sample size inflated for it?" Recommended: inflate n by 1/(1 − dropout) using a justified rate. Canon: Chow, Shao & Wang; ICH E9(R1).
"Single primary endpoint or multiple — and if multiple, what's the multiplicity control?" Recommended: pre-specify alpha allocation (hierarchical / Bonferroni). Canon: FDA Multiple Endpoints guidance (2022).
"Who is the named biostatistician / medical monitor / regulatory owner signing this synopsis?" Recommended: name them now — this output is a recommendation, not a protocol. Canon: ICH E6(R2) GCP roles & responsibilities.
Walk depth-first. Lock 1-2 before opening 3-5. After all are answered, invoke endpoint_selector.py → sample_size_estimator.py → phase_gate_scorer.py.
tools
Code review automation for TypeScript, JavaScript, Python, Go, Swift, Kotlin, C#, .NET, Java, C, C++, Rust, Ruby, PHP, and Dart/Flutter. Analyzes PRs for complexity and risk, checks code quality for SOLID violations and code smells, generates review reports. Use when reviewing pull requests, analyzing code quality, identifying issues, generating review checklists.
tools
Use when planning, funding, scoping, or synthesizing enterprise research across workstreams — clinical study design, R&D program finance, market sizing/surveys, or product/user research. Triggers on "design this clinical study", "what sample size", "R&D budget", "burn rate", "capitalize or expense", "TAM SAM SOM", "market sizing", "survey design", "segment the market", "plan user interviews", "usability test", "synthesize research insights". Forks context to route to one of four Research-Operations sub-skills (clinical-research, research-finance, market-research, product-research) and returns a digest. Distinct from ra-qm-team (regulatory submission), finance (corporate close/valuation), research/grants (funding discovery), product-team (persona/journey/live experiments), and marketing-skill (campaign analytics).
development
Use when managing the money for an internal R&D program or portfolio — building a multi-period program budget with the F&A (indirect) split, tracking burn rate and runway against value-inflection milestones, or routing R&D cost items to a capitalize-vs-expense determination. Every budget output surfaces its assumptions block; capitalize-vs-expense is decision-support only and routes to a named finance owner — it never books an entry or decides accounting treatment. Distinct from finance/financial-analysis (corporate DCF, close, valuation) and research/grants (funding discovery — this manages money already won).
development
Use when planning and synthesizing product/user research as a method-and-repository discipline — selecting the right method for the goal (generative interviews vs usability test vs concept test vs validation), computing method-based saturation/sample size with an explicit confidence level, or synthesizing coded observations into insights while flagging single-source anecdotes. Never fabricates user insight; an insight requires recurrence across independent participants. Distinct from product-team/ux-researcher-designer (persona/journey artifacts), product-discovery (discovery-sprint planning), and experiment-designer (live A/B) — this is the research-ops method + insight-repository layer.