skills/council/alchemist/ml-workflow/SKILL.md
Use when designing end-to-end ML workflows. Covers experiment tracking, feature engineering and storage, model training pipelines, serving and deployment, A/B testing, and drift monitoring. Do not use for data warehouse schema design (use schema-evaluation) or ETL pipeline architecture (use pipeline-design).
npx skillsauth add dtsong/my-claude-setup ml-workflowInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design end-to-end ML workflows covering experiment tracking, feature engineering and storage, model training pipelines, model serving and deployment, A/B testing for models, and monitoring for data and model drift. Produces a workflow architecture, tool selection rationale, and operational runbook.
Reads ML code, configuration files, experiment logs, and infrastructure specs for analysis. Does not train models, execute experiments, or deploy to production.
No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.
Before any tooling decisions, formalize:
Document the problem statement, target variable, evaluation metric, and success threshold.
Map raw data to model-ready features:
Set up reproducible experiment management:
Build a reproducible, automated training workflow:
Plan how predictions reach users:
For real-time serving, specify: latency SLA (p50/p99), throughput (requests/second), scaling strategy (auto-scale triggers), and fallback behavior (what happens if the model is unavailable?).
Plan controlled rollout of model changes:
Plan ongoing model health monitoring:
Define retraining policy: scheduled (weekly/monthly), triggered (drift detected), or continuous (online learning).
Compaction resilience: If context was lost during a long session, re-read the Inputs section to reconstruct what system is being analyzed, check the Progress Checklist for completed steps, then resume from the earliest incomplete step.
# ML Workflow: [Project/Model Name]
## Problem Definition
| Aspect | Detail |
|--------|--------|
| Problem type | ... |
| Target variable | ... |
| Business metric | ... |
| Evaluation metric | ... |
| Baseline performance | ... |
| Success threshold | ... |
## Feature Engineering
| Feature | Source | Transformation | Type | Leakage Risk |
|---------|--------|---------------|------|-------------|
| ... | ... | ... | ... | Low/Med/High |
**Feature store:** [Yes/No — tool choice and rationale]
## Experiment Tracking
| Aspect | Choice | Rationale |
|--------|--------|-----------|
| Tool | ... | ... |
| What's tracked | ... | ... |
| Organization | ... | ... |
## Training Pipeline
[ASCII diagram showing data → features → train → evaluate → register]
| Stage | Tool/Method | Notes |
|-------|------------|-------|
| Data split | ... | ... |
| Training | ... | ... |
| Tuning | ... | ... |
| Validation | ... | ... |
| Registry | ... | ... |
## Model Serving
| Aspect | Detail |
|--------|--------|
| Serving mode | Batch / Real-time / Streaming / Edge |
| Latency SLA | ... |
| Throughput | ... |
| Scaling | ... |
| Fallback | ... |
## A/B Testing
| Aspect | Detail |
|--------|--------|
| Traffic split | ... |
| Primary metric | ... |
| Guardrail metrics | ... |
| Min duration | ... |
| Rollback criteria | ... |
## Monitoring and Drift
| Monitor | Tool | Threshold | Action |
|---------|------|-----------|--------|
| Data drift | ... | ... | ... |
| Model drift | ... | ... | ... |
| Concept drift | ... | ... | ... |
| Operational | ... | ... | ... |
**Retraining policy:** [Scheduled / Triggered / Continuous — details]
development
Use when planning implementation steps, deciding commit format, or structuring development approach. Provides brainstorm-plan-implement flow with conventional commits. Triggers on 'how should I approach this', 'commit format'.
development
Security audit checklist for web applications. Use when reviewing, auditing, or hardening a web app's security posture. Covers rate limiting, auth headers, IP blocking, CORS, security middleware, input validation, file upload limits, ORM usage, and password hashing. Triggers on requests like "review security", "harden this app", "security audit", "check for vulnerabilities", or when building/reviewing API endpoints.
development
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
development
React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.