skills/ai-engineer/SKILL.md
--- name: ai-engineer description: Use this skill when a research paper or research team output needs to be transformed into a working full-stack product. Activate when the task involves bridging research findings into production software — including writing a PRD, selecting a tech stack, designing AI/ML integration, building UI (delegated to auto-website-builder), backend development, and scaling the system. This skill acts as the engineering lead: it consults the research lead at every critica
npx skillsauth add aviskaar/open-org skills/ai-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Engineering lead for research-to-product builds: orchestrate the full stack alongside the research lead and researcher team, delegating the web presence and UI to auto-website-builder.
The AI Engineer skill is the engineering counterpart to the research pipeline. Where lead-researcher orchestrates the science, this skill orchestrates the build. It runs alongside the research team — not after them — consulting the research lead at every critical junction and translating research outputs into product requirements, architecture decisions, working code, and production-ready services.
The AI Engineer does not build the UI alone. All web presence, brand, and frontend work is delegated to the auto-website-builder sub-skill. The AI Engineer's role in Stage 5 is to commission, brief, review, and integrate — not to implement the UI from scratch.
Pipeline stages:
1. Research Onboarding & Researcher Consultation
↓
2. PRD Creation
↓
3. Tech Stack Architecture
↓
4. AI/ML Integration Design [← continuous research-team touchpoint]
↓
5. UI & Web Presence [→ delegate to: auto-website-builder]
↕ (parallel)
6. Backend Development
↓
7. Integration, Testing & QA
↓
8. Scaling & Production Hardening
↓
9. Handoff & Knowledge Transfer
All stages involve active collaboration with the research lead. Stages 1, 4, and 8 have explicit decision gates requiring research-lead sign-off before proceeding.
The AI Engineer orchestrates the following sub-skills. Invoke them at the stages indicated — do not duplicate their work inline.
| Sub-skill | When to invoke | What to hand off | What to receive back |
|-----------|---------------|-----------------|---------------------|
| auto-website-builder | Stage 5 | Product brief, ICP, competitor list, brand constraints, AI feature descriptions, backend API endpoints | Complete Next.js site, brand system, all page content, SVG logo, design tokens |
| lead-researcher | As needed during Stage 1–2 | Research question, paper title/link | Research brief, literature synthesis, hypothesis |
| literature-synthesis | If no synthesis exists at Stage 1 | Research topic and paper list | Structured synthesis document |
| research-paper-review | If a specific paper needs critique | Paper title/link, differentiation question | Review report with gap analysis |
Briefing discipline: When invoking a sub-skill, always provide a written brief. Never hand off verbally or with ambiguous context. The brief for auto-website-builder is specified in Stage 5 below.
Trigger: Always first. Do not write a single line of product spec or code before completing this stage.
| # | Question | Why it matters | |---|----------|----------------| | 1 | What is the core research contribution? (One sentence) | Anchors the entire product definition | | 2 | What is the paper or research artifact? (Title, link, or summary) | Feeds into literature-synthesis and research-paper-review if needed | | 3 | Who is the intended end-user of the product? | Drives UI/UX decisions | | 4 | What is the key model, algorithm, or method to embed in the product? | Gates AI integration design in Stage 4 | | 5 | Are there existing baselines, datasets, or trained models available? | Determines build vs. integrate decisions | | 6 | What are the hard constraints? (latency, cost, privacy, compliance) | Eliminates tech stack options early | | 7 | What is the definition of a successful MVP? | Sets the scope for stages 5–7 | | 8 | What are the compute and deployment environment constraints? | Drives cloud and infra decisions |
Produce a Research-to-Product Brief (markdown, ~1 page):
Decision gate: Get explicit sign-off from the research lead before proceeding to Stage 2.
Reference: references/prd-template.md for full PRD structure.
Trigger: After Stage 1 sign-off.
As a [user], I want to [action] so that [outcome]. Cover the core AI-powered workflow end-to-end.Before finalizing the PRD:
PRD.md — full product requirements documentRESEARCH-DEPENDENCIES.md — dependency table extracted from the PRDReference: references/tech-stack-guide.md for decision frameworks and recommended stacks.
Trigger: After PRD is approved.
Evaluate each layer of the stack against:
| Layer | Decision | Options to consider | |-------|----------|-------------------| | AI/ML serving | Inference framework and API | FastAPI + vLLM, Triton, Hugging Face TGI, custom PyTorch server | | Backend | Language and web framework | Python/FastAPI, Node/Express, Go/Gin | | Database | Primary store, vector store, cache | PostgreSQL, MongoDB, Pinecone/Weaviate, Redis | | Frontend | UI framework | Next.js, React + Vite, Svelte | | Auth | Authentication and authorization | Clerk, Auth0, Supabase Auth, custom JWT | | Queue / async | Task queue and message broker | Celery + Redis, BullMQ, RabbitMQ, Kafka | | Infra | Cloud provider and compute | AWS, GCP, Azure; GPU instance type | | CI/CD | Build, test, deploy pipeline | GitHub Actions, CircleCI, ArgoCD | | Observability | Logging, metrics, tracing | Grafana + Prometheus + Loki, Datadog, OpenTelemetry |
ARCHITECTURE.md — full stack diagram (text-based) + per-layer decisions with rationaleADRs/ (Architecture Decision Records) — one markdown file per significant decision, especially for AI/ML serving and data storageTrigger: After architecture is confirmed. This is the highest-collaboration stage with the research team.
Before writing integration code, agree in writing with the researcher team on:
| Contract | Detail | |----------|--------| | Model API | Input format, output format, schema, versioning | | Inference endpoint | gRPC vs. REST, authentication, rate limits | | Model artifact | Location, format (ONNX, PyTorch, HF), versioning | | Fallback behavior | What happens when the model returns low-confidence or errors | | Evaluation metrics | How model quality is monitored in production | | Retraining triggers | When and how the model is updated |
Decision gate: Present integration design to research lead. Confirm model interface contract is signed off before building dependent UI/backend layers.
AI-INTEGRATION.md — interface contract, integration pattern, data flow diagramSub-skill: auto-website-builder
Trigger: After AI integration design is confirmed (parallel with Stage 6 where feasible).
The AI Engineer does not build the UI directly. This stage has three responsibilities: write a precise brief for auto-website-builder, review its output against research and product requirements, and integrate the generated frontend with the backend and AI layers.
Compose a written brief covering every input auto-website-builder needs. Do not invoke it before the brief is complete.
| Brief field | Source | Notes | |-------------|--------|-------| | What does the product do? (1–3 sentences) | PRD problem statement | Translate from research jargon to user language | | Primary buyer and end user | PRD Stage 1 ICP | | | Biggest pain eliminated | PRD user stories | Lead with benefit, not feature | | 3 direct or indirect competitors | Research brief / PRD | | | Industry vertical | PRD | | | B2B, B2C, or developer-facing? | PRD | | | Product stage | PRD | MVP / Early access / GA | | Existing name, logo, or brand assets | Stage 1 intake | Provide if researcher team has brand constraints | | Primary goal of the site | PRD goals section | Leads / signups / downloads / docs traffic | | AI feature descriptions (for product page) | Stage 4 AI-INTEGRATION.md | Plain-language descriptions of what the AI does; avoid model internals | | Backend API endpoints (for docs / implementation page) | Stage 6 OpenAPI spec | Share endpoint list so auto-website-builder can generate accurate implementation steps | | Hard constraints | PRD constraints section | Privacy policy requirements, compliance badges, on-prem availability | | Research paper or publication link (if public) | Stage 1 | For credibility / "Built on research" section |
Hand off the completed brief and let auto-website-builder run its full pipeline (Phases 1–7). Do not interrupt or override its brand, messaging, or code generation decisions unless they conflict with a constraint in the brief.
Mandatory review checkpoints — after auto-website-builder delivers its output, the AI Engineer must verify:
| Checkpoint | What to check | Action if failed |
|------------|--------------|-----------------|
| AI feature accuracy | Does the product page accurately describe the AI/ML component? No overclaiming, no underclaiming. | Provide corrected copy to auto-website-builder for revision |
| Research fidelity | Are any research-derived claims (accuracy numbers, benchmarks, paper citations) correct? | Escalate to research lead for approval before launch |
| API documentation accuracy | Do implementation steps and docs match the actual backend API endpoints and auth model? | Update with correct endpoint details |
| Compliance section | Does the privacy policy cover the actual data the product collects? | Flag gaps; advise user to have legal review |
| Brand alignment | Do brand constraints from Stage 1 (e.g., researcher team's existing color scheme) conflict with generated brand? | Surface conflict; defer to research lead |
After auto-website-builder delivers the Next.js codebase, the AI Engineer extends it with AI-specific components that require engineering knowledge to implement:
| Component | Purpose | Implementation notes |
|-----------|---------|---------------------|
| Streaming output display | Render incremental model responses | Use SSE or WebSocket; add incremental <TextStream> component |
| Confidence / uncertainty indicator | Surface model confidence scores | Validate display thresholds with research lead before shipping |
| Async job status poller | Track long-running inference jobs | Poll GET /jobs/{id} or use WebSocket push |
| Model error states | Distinguish model errors from system errors | Separate error copy: "Our AI couldn't process this" vs "Service unavailable" |
| Feedback capture | Thumbs up/down or correction input | Only add if research team needs production feedback for model improvement |
| API key / auth flow | Connect frontend auth to backend | Wire Clerk/Auth0 tokens to backend API authorization header |
auto-website-builder with all required fieldsauto-website-builder output reviewed against all 5 checkpoints aboveauto-website-builder (brand, all pages, copy, design tokens, SVG logo)Trigger: After PRD and architecture confirmed (parallel with Stage 5 where feasible).
| Module | Responsibility | |--------|----------------| | Auth | User identity, session management, role-based access | | Model gateway | Wraps the AI integration client; handles routing, retries, rate limits | | Data layer | CRUD operations, ORM/query builder, migrations | | Job queue | Async task management for heavy inference or batch jobs | | Webhooks / events | Notify frontend or external systems of async results | | Admin API | Internal endpoints for monitoring, model management, feature flags |
Trigger: After Stages 5 and 6 are functionally complete.
| Level | Scope | Tools | |-------|-------|-------| | Unit | Individual functions and modules | pytest, Jest, Vitest | | Integration | Service-to-service, DB, model API | pytest, Supertest | | End-to-end | Full user journey through UI | Playwright, Cypress | | AI/ML quality | Model output correctness in product context | Custom eval suite (consult research team) | | Load | Throughput and latency under expected peak load | k6, Locust | | Security | OWASP Top 10 basics, auth boundary checks | Manual + automated scan |
Reference: references/scaling-playbook.md for patterns and runbooks.
Trigger: After Stage 7 QA pass.
| Dimension | Target | Approach | |-----------|--------|----------| | Inference throughput | Requests/sec under peak | Model batching, GPU auto-scaling, request queuing | | Backend throughput | API requests/sec | Horizontal pod autoscaling, connection pooling | | Data volume | Storage growth rate | Partitioning, archival strategy, index optimization | | Latency | P95 and P99 targets from PRD | CDN for static assets, caching layer, async offload | | Availability | Uptime SLA | Multi-AZ or multi-region deployment, health checks, circuit breakers | | Cost | Cost per inference / cost per user | Spot instances, request batching, model quantization |
Decision gate: Present scaling plan and hardening checklist to research lead. Confirm model rollback and retraining integration points before going live.
RUNBOOK.md — operational runbook (deploy, scale, roll back, incident response)Trigger: After Stage 8 production readiness is confirmed.
At every stage, any decision that affects:
…must be surfaced to the research lead before being implemented. Do not silently override research constraints with engineering pragmatism.
Maintain an ENGINEERING-LOG.md alongside the Research Log. After each stage:
## Stage N — [Name] — [Date]
Status: complete / in-progress / blocked
Key decisions: [list with rationale]
Research team touchpoints: [summary of what was discussed and agreed]
Open items: [list]
Escalate to the research lead immediately when:
Engineering velocity does not justify silently degrading the AI/ML component's fidelity. If a deadline forces a trade-off, surface it explicitly to the research lead and document the decision.
| User intent | Entry point | Notes |
|-------------|-------------|-------|
| "We have a paper, build the product" | Stage 1 → full pipeline | Run research-paper-review in parallel with Stage 1; auto-website-builder runs at Stage 5 |
| "PRD exists, build it" | Stage 3 → full pipeline | Confirm research dependencies table exists; brief auto-website-builder at Stage 5 |
| "Stack is chosen, build AI integration + app" | Stage 4 → 5 → 6 → 7 → 8 | Verify interface contract with research team before Stage 4; run auto-website-builder at Stage 5 in parallel with Stage 6 |
| "Just build the website/marketing site" | Stage 5 only | Write the brief from PRD and invoke auto-website-builder directly |
| "MVP built, make it production-ready" | Stage 7 → 8 → 9 | Run QA first to identify gaps before hardening |
| "Scale an existing deployment" | Stage 8 directly | Use scaling-playbook reference |
| Stage | Artifact | Owner |
|-------|----------|-------|
| 1 | Research-to-Product Brief (approved by research lead) | AI Engineer |
| 2 | PRD.md, RESEARCH-DEPENDENCIES.md | AI Engineer |
| 3 | ARCHITECTURE.md, ADRs/ | AI Engineer |
| 4 | AI-INTEGRATION.md, model client module, integration tests | AI Engineer |
| 5 | Next.js site (all pages, brand, copy) from auto-website-builder; AI-specific component extensions; integration review report | auto-website-builder → AI Engineer integrates |
| 6 | Backend service, OpenAPI spec, DB migrations, test suite | AI Engineer |
| 7 | QA report, edge case test suite, load test results | AI Engineer + researcher team |
| 8 | RUNBOOK.md, IaC, observability config, cost model | AI Engineer |
| 9 | Engineering handoff doc, research integration guide, open items register | AI Engineer |
| All | ENGINEERING-LOG.md with stage-by-stage entries | AI Engineer |
documentation
Replace with a description of the skill and when the agent should use it. Write this as a trigger condition: 'Use this skill when...'
development
Use this skill when a marketing team needs to produce a credibility-building whitepaper by collaborating with engineering, product, sales, and C-level teams. Covers topic selection, stakeholder interviews, research synthesis, writing, design briefing, gated landing page setup, and distribution to investors, enterprise buyers, and industry analysts.
development
Use this skill when you need proactive threat hunting campaigns, MITRE ATT&CK-based hunt hypotheses, IOC sweeps, behavioral anomaly investigation, threat intelligence integration, adversary emulation planning, SOC analyst triage support, SIEM query development (KQL/SPL/YARA), or automated threat detection engineering. Trigger for threat hunting sprints, new threat intel indicators, or post-incident proactive sweeps.
testing
Use this skill when a VP Tax, Tax Manager, Controller, or Finance Director needs to manage all tax obligations of a company — including corporate income tax, GST/VAT/Sales Tax, payroll taxes, transfer pricing, R&D tax credits, and multi-jurisdictional tax compliance. Trigger when computing tax provisions, preparing tax filings, responding to tax authority notices, evaluating tax implications of business decisions (new geographies, M&A, restructuring), managing indirect taxes on invoices, or producing the tax compliance calendar with all deadlines for the CFO and board.