.agents/skills/ml-pipeline/SKILL.md
--- name: ml-pipeline description: ML pipeline orchestrator — single entry point for ML-related tasks. Coordinates ML Engineer (analysis, modeling, recommendations), SRE Engineer (production data extraction), and Product Manager (task formulation). Domain context from `AGENTS.md`. MVP flow: define data requirements → extract from prod → analyze → model → recommend → feature plan. context: fork argument-hint: [analysis goal or dataset description] codex-roles: - product-manager - ml-engineer
npx skillsauth add avav25/ai-assets .agents/skills/ml-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Single entry point for ML-related tasks. Orchestrates three roles across a data-driven analysis pipeline: Product Manager formulates the task, ML Engineer owns analysis and modeling, SRE Engineer extracts production data. Domain specifics are read from the project's AGENTS.md.
Architecture note: This workflow is extensible — future sub-flows (model training, prompt tuning, A/B testing) can be added as numbered branches in Step 2.
Gather the ML task from the user:
If the user provides a vague request — proceed to Step 2 where product-manager role will help formulate it.
Select the pipeline based on the ML task:
| Pipeline Type | When to Use | Status | |---|---|---| | Data Analysis | Extract prod data → analyze → model → recommend parameter/prompt changes | ✅ Active | | Model Training | Train or retrain ML models on collected data | 🔮 Future | | Prompt Tuning | Optimize LLM prompts based on output quality analysis | 🔮 Future | | A/B Test Analysis | Analyze results of live experiments | 🔮 Future |
For MVP, proceed with Data Analysis pipeline (Steps 3–9). Future pipeline types will be added as parallel branches from this step.
Apply product-manager role.
Read and internalize the project's domain:
AGENTS.md (root) — tech stack, domain description, business contextARCHITECTURE.md — system components, data flow, service boundariesFEATURES.md (if exists) — current feature inventory, scoring/evaluation parametersExtract and summarize:
Present the domain summary to the user for confirmation before proceeding.
Continue with product-manager role.
Transform the user's request into a structured ML task specification:
## ML Task Specification
### Objective
[What we want to learn or improve — measurable outcome]
### Business Context
[Why this matters — which product decisions depend on this analysis]
### Success Criteria
[How we know the analysis is complete and actionable]
- [ ] Data requirements defined and extracted
- [ ] Statistical analysis performed with confidence intervals
- [ ] Recommendations backed by evidence (not intuition)
- [ ] Action plan produced (parameter changes, prompt modifications, or feature work)
### Scope
- **In scope**: [specific parameters, data ranges, user segments]
- **Out of scope**: [what we explicitly skip]
### Constraints
- Data sensitivity: [PII handling, anonymization needs]
- Time range: [how far back to analyze]
- Compute budget: [any limits on processing]
Present to user for approval. Adjust based on feedback.
Apply ml-engineer role.
Based on the ML task specification from Step 4, define exactly what data is needed:
<data_requirements>
Produce a concrete data extraction plan — either SQL queries, API calls, or log extraction commands that sre-engineer role can execute.
Present the data requirements and extraction plan to the user for review.
Apply sre-engineer role.
Execute the data extraction plan from Step 5. This step uses SRE's production access and operational expertise.
⚠️ SAFETY: All commands are READ-ONLY. No mutations to production data or systems.
Follow analyze-prod skill Steps 3a–3b to verify cloud platform context. Consult cloud-platforms skill for platform-specific CLI commands.
Based on the extraction plan:
| Data Source | Method |
|---|---|
| Managed DB (SQL) | Platform-specific connection (see cloud-platforms skill CLI Reference → Data Extraction) + psql/mysql query |
| Application logs | Platform-specific log query (see cloud-platforms skill Observability → Diagnostic Commands) |
| Monitoring metrics | Platform-specific metrics API (see cloud-platforms skill Observability) |
| Data warehouse | Platform-specific query tool (BigQuery / Synapse / Athena) |
| Object storage | Platform-specific CLI (gsutil / az storage / aws s3) |
| Kubernetes pod data | kubectl logs or kubectl exec (read-only) |
For each extraction:
data/raw/)Present extraction summary:
## Data Extraction Summary
- Source: [database/logs/metrics]
- Records extracted: [count]
- Time range: [from — to]
- Files: [local paths]
- Data quality notes: [nulls, anomalies, gaps]
Apply ml-engineer role.
Perform the analysis following ML Engineer's reasoning protocol:
Present analysis as a structured report:
## Analysis Report
### Data Summary
- Records analyzed: [count]
- Time period: [range]
- Segments: [if applicable]
### Key Findings
1. [Finding with statistical evidence]
2. [Finding with statistical evidence]
3. [Finding with statistical evidence]
### Visualizations
[Charts, plots, tables illustrating findings]
### Model Performance (if applicable)
- Baseline: [metric = value]
- Proposed: [metric = value]
- Improvement: [delta with confidence interval]
Continue with ml-engineer role, consult product-manager role for business framing.
Translate analysis findings into actionable recommendations:
## ML Recommendations
### Summary
[One paragraph: what the analysis revealed and what should change]
### Recommended Changes
#### Parameter/Weight Changes
| Parameter | Current Value | Recommended Value | Expected Impact | Confidence |
|---|---|---|---|---|
| [param] | [current] | [recommended] | [metric improvement] | [high/medium/low] |
#### Prompt Modifications (if applicable)
| Prompt | Current Issue | Recommended Change | Expected Improvement |
|---|---|---|---|
| [prompt location] | [what's wrong] | [specific edit] | [quality metric] |
When recommendations include prompt or context pipeline changes (RAG, memory, agent harness, token budget), consult `context-engineering` skill for architecture patterns and production checklists.
#### Other Recommendations
- [Infrastructure, data pipeline, monitoring changes]
### Evidence
[Reference to analysis report, key metrics, statistical tests]
### Risks
| Risk | Mitigation |
|---|---|
| [what could go wrong] | [how to handle it] |
### Validation Plan
- [ ] A/B test for [specific changes] over [duration]
- [ ] Monitor [metrics] for regression
- [ ] Rollback criteria: [threshold]
Present recommendations to the user. Wait for approval before proceeding.
After user approves the recommendations, invoke feature-plan skill with the following input:
Feature specification (auto-generated from Step 8):
feature-plan skill will decompose the work into role-scoped work packages and produce an implementation plan.
Present the completed ML pipeline run:
product-manager role, ml-engineer role, sre-engineer rolefeature-dev skill per work packageanalyze-prod skill (production data access, Steps 6a–6b)feature-plan skill (Step 9), then feature-dev skill (implementation)product-manager role (task formulation), ml-engineer role (analysis, modeling), sre-engineer role (data extraction)test-strategy skill (validation plan), context-engineering skill (context pipeline design, RAG, memory engineering, agent harness — for LLM/prompt recommendations), prompt-engineering skill (technique selection, eval strategy)development
Use this skill when running the recurring (daily) knowledge-base rescan for a repo that already has knowledge/.knowledge-sync.yml — the main-thread dispatcher that reads the config, computes the git delta since last_scanned_sha, maps changed paths to affected doc areas, early-exits cheaply when nothing changed, then fans out one Agent(content-writer) per affected area, applies the propose/direct update policy, advances the baseline only on success, and writes an L4 run log — all with the G1 untrusted-content choke-point, secret-scan, deny-list, and budget controls woven in. For first-time setup use /knowledge-sync-init.
development
Use this skill when bootstrapping scheduled knowledge-base sync for a repo that has no knowledge/.knowledge-sync.yml yet — to run one-time setup that detects the knowledge_root from CLAUDE.md/AGENTS.md, maps doc areas to source globs, records opt-in external sources (Linear/Notion/WebFetch, all disabled by default), captures a baseline last_scanned_sha, sets the per-area update policy, generates or seeds knowledge/CONVENTIONS.md, provisions the L4 memory dir, and offers to register the daily routine. Routes ongoing recurring sync operations to /knowledge-sync.
tools
Use this skill when bootstrapping a target repository to be ai-skills-aware — on the first run of any ai-skills workflow in a fresh repo, when adopting the ai-skills plugin in an existing repo, or after upgrading to a plugin version that adds new memory paths or templates, including when the user does not say "init" but asks to "set up" or "onboard" the repo — to detect codebase type, create CLAUDE.md + AGENTS.md scaffolding, initialize the .ai-skills-memory/ directory tree from L1 templates, and configure .gitignore. Idempotent — safe to re-run. Accepts `--codebase-type <type>` and `--overwrite`. Not for re-initializing only memory — use `/memory-init` instead.
tools
Use this skill when extending, repairing, or improving plugin assets, when ingesting a `/feedback` report as a fix-cycle backlog, or when you do not remember which lower-level command is right for the job — the umbrella workflow for ai-skills plugin-asset authoring and maintenance: creating, auditing, fixing, improving, refactoring, and migrating skills, agents, rules, hooks, prompts, schemas, and rubrics inside the plugin. Auto-classifies the request, loads the right knowledge skills (`@prompt-engineering`, `@context-engineering`, `@team-protocols`), and spawns the right subagents (`prompt-engineer`, `system-architect`, `python-engineer`, `software-engineer`, `qa-engineer`, `eval-judge`) via the `Agent` tool.