Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

kayba-ai/kayba-stage-2-domain-context

Name: kayba-stage-2-domain-context
Author: kayba-ai

ace/cli/skills/kayba-pipeline/stage-2-domain-context/SKILL.md

npx skillsauth add kayba-ai/agentic-context-engine kayba-stage-2-domain-context

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Stage 2: Domain Context Gathering

Understand the agent's world — what it does, what tools it has, and what "success" looks like.

Inputs

TRACES_FOLDER — path to directory containing trace JSON files

Process

0. Detect trace format

Before reading traces, identify the framework that produced them. Read 1 trace file and check:

| Signal | Framework | |--------|-----------| | info.agent_info.implementation, info.environment_info, simulation.messages[] with role/tool_calls/turn_idx | tau2-bench | | runs[].steps[] with type: "tool", lc_kwargs | LangChain / LangSmith | | events[] with event_type, span_id, parent_id | LlamaIndex | | choices[].message.tool_calls[] at top level | Raw OpenAI API logs | | trace.spans[] with attributes, trace_id | OpenTelemetry / Arize / Langfuse |

Record the detected format in the output under Trace Format. All subsequent trace-reading steps use the field paths appropriate for that format.

If the format is unrecognized, note the top-level keys and structure, then proceed best-effort with field names found in the data.

1. Detect architecture

Read 2-3 traces and determine if this is a single-agent or multi-agent system:

Single agent: one agent_info entry, one conversation thread, tool calls from one identity
Multi-agent / router: look for multiple agent_info entries, routing tool calls (e.g., transfer_to_*, delegate_to_*), sub-conversation arrays, or distinct system prompts per agent identity

If multi-agent: document each agent separately (name, role, tools, handoff triggers) and note the routing logic. The remaining steps apply per-agent.

2. Find the system prompt

Use a fallback chain — stop at the first hit:

Config files — grep for keys: system_prompt, system_message, instructions, AGENT_INSTRUCTION, SYSTEM_PROMPT in YAML/JSON/TOML/Python/JS files
Source code — search for prompt template strings, f-strings, or .format() calls that build the system message (look in agent implementation files)
Trace extraction — read 3 trace files from {TRACES_FOLDER}:
- Check info.environment_info.policy (tau2-bench format)
- Check first message with role: "system" in the messages array
- Check raw_data fields for system-level content
Not found — if none of the above yields a system prompt, explicitly record SYSTEM_PROMPT_STATUS: NOT_FOUND in the output and flag this for the orchestrator. Do not fabricate or guess.

When found, record both the prompt content and its source location (file path + line, or trace field path).

3. Extract tool definitions

Two-pass approach: source code first (ground truth), then traces (usage evidence).

Pass 1 — Source code discovery:

Search for tool/function definition patterns: @tool, @is_tool, def tool_, function schema arrays, OpenAPI specs, tools=[] arguments
For each tool, extract from source:
- Name
- Input parameters with types and defaults
- Return type / output schema (document the structure, not just "returns a dict")
- Side effects: READ (no state change), WRITE (mutates state), GENERIC (neither)
- Validation rules the tool does NOT enforce (critical — grep for comments like "API does not check", "agent must enforce")

Pass 2 — Trace usage evidence:

Read ALL traces (if <= 20) or a stratified sample (see step 4 for sampling)
Extract every unique tool_calls[].name from assistant messages
Extract every role: "tool" response to document actual output shapes
For each tool, record one example input/output pair from traces

Reconcile the two passes:

Tools in source but NOT in traces = "available but unused" — flag these; they may be relevant for edge cases the agent should handle
Tools in traces but NOT in source = possible dynamic tools or external APIs — investigate

Output the full tool inventory as a table with columns: Name, Category, Input Schema, Output Schema, Observed in Traces (Y/N), Unvalidated Rules.

4. Find domain documentation

READMEs, product docs, wiki links
Policy files (e.g., data/*/policy.md, domain-specific docs)
Inline code comments explaining business logic
Test files that describe expected behavior
Anything that explains what the agent does and what "success" means for its users

5. Catalogue agent behavior patterns

Trace selection — stratified sampling (do not just grab "5-10 random traces"):

Count total traces in {TRACES_FOLDER}. If <= 20, read ALL of them.
If > 20, select a stratified sample:
- Sort by termination_reason — include at least 2 per unique reason
- Sort by conversation length (message count) — include shortest, longest, and 2 median
- Sort by tool call count — include lowest and highest
- If task outcomes are available (pass/fail), include at least 3 of each
- Target: ~15 traces total, or 30% of the corpus, whichever is larger

For each selected trace, document:

Function call frequency — which tools are called most, in what order
Tool call sequences — common tool chains (e.g., get_user -> get_reservation -> cancel)
Success patterns — what does a thread that accomplishes its goal look like?
Failure patterns — what does a thread that fails or gets stuck look like?
Error patterns — what error strings appear in tool outputs? Group by root cause
Policy violation patterns — where does the agent break its own rules? (e.g., multiple tool calls per turn, acting without confirmation)
User feedback signals — reverts, ratings, explicit corrections, escalations, stop tokens, transfer tokens

6. Write findings

Write all findings to eval/stage2_domain_context.md:

# Domain Context

## Trace Format
- Framework: [detected framework name]
- Key field paths: [e.g., simulation.messages[], info.environment_info.policy]

## Architecture
- Type: [single-agent | multi-agent]
- [If multi-agent: agent roster with roles and handoff triggers]

## Agent Purpose
[1-2 sentence summary of what this agent does]

## System Prompt
- **Source**: [file path + line, or trace field path, or NOT_FOUND]
- **Status**: [verbatim | reconstructed | not_found]

[The system prompt content, or "NOT_FOUND — downstream stages should account for missing system prompt"]

## Tools
| Tool | Category | Input Schema | Output Schema | In Traces? | Unvalidated Rules |
|------|----------|-------------|---------------|------------|-------------------|
| tool_name | READ/WRITE/GENERIC | `{param: type}` | `{field: type}` | Y/N | "API does not check X" |

### Tools available but never called in traces
- [tool_name — why it matters]

## Domain Rules
[Key business rules, constraints, policies the agent must follow]

## Behavior Patterns

### Success patterns
- [pattern 1]

### Failure patterns
- [pattern 1]

### Policy violation patterns
- [violation with frequency: N/M turns]

### Error patterns
| Error | Frequency | Root cause |
|-------|-----------|------------|
| error string | N traces | cause |

### User feedback signals
- [signal 1]

Outputs

eval/stage2_domain_context.md

kayba-ai/kayba-stage-2-domain-context

ace/cli/skills/kayba-pipeline/stage-2-domain-context/SKILL.md

Gather domain context about the repository and agent — system prompt, tool definitions, domain docs, and behavior patterns from traces. Trigger when the user says "run stage 2", "gather context", "domain context", or when invoked by the kayba-pipeline orchestrator.

2,170 stars

tools

Updated Apr 27, 2026

$ install --global

skillsauth

npx skillsauth add kayba-ai/agentic-context-engine kayba-stage-2-domain-context

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 27, 2026, 12:55 PM51.3s1 file scanned

SKILL.md

name:: kayba-stage-2-domain-context
description:: Gather domain context about the repository and agent — system prompt, tool definitions, domain docs, and behavior patterns from traces. Trigger when the user says "run stage 2", "gather context", "domain context", or when invoked by the kayba-pipeline orchestrator.

Stage 2: Domain Context Gathering

Understand the agent's world — what it does, what tools it has, and what "success" looks like.

Inputs

TRACES_FOLDER — path to directory containing trace JSON files

Process

0. Detect trace format

Before reading traces, identify the framework that produced them. Read 1 trace file and check:

Record the detected format in the output under Trace Format. All subsequent trace-reading steps use the field paths appropriate for that format.

If the format is unrecognized, note the top-level keys and structure, then proceed best-effort with field names found in the data.

1. Detect architecture

Read 2-3 traces and determine if this is a single-agent or multi-agent system:

Single agent: one agent_info entry, one conversation thread, tool calls from one identity
Multi-agent / router: look for multiple agent_info entries, routing tool calls (e.g., transfer_to_*, delegate_to_*), sub-conversation arrays, or distinct system prompts per agent identity

If multi-agent: document each agent separately (name, role, tools, handoff triggers) and note the routing logic. The remaining steps apply per-agent.

2. Find the system prompt

Use a fallback chain — stop at the first hit:

Config files — grep for keys: system_prompt, system_message, instructions, AGENT_INSTRUCTION, SYSTEM_PROMPT in YAML/JSON/TOML/Python/JS files
Source code — search for prompt template strings, f-strings, or .format() calls that build the system message (look in agent implementation files)
Trace extraction — read 3 trace files from {TRACES_FOLDER}:
- Check info.environment_info.policy (tau2-bench format)
- Check first message with role: "system" in the messages array
- Check raw_data fields for system-level content
Not found — if none of the above yields a system prompt, explicitly record SYSTEM_PROMPT_STATUS: NOT_FOUND in the output and flag this for the orchestrator. Do not fabricate or guess.

When found, record both the prompt content and its source location (file path + line, or trace field path).

3. Extract tool definitions

Two-pass approach: source code first (ground truth), then traces (usage evidence).

Pass 1 — Source code discovery:

Search for tool/function definition patterns: @tool, @is_tool, def tool_, function schema arrays, OpenAPI specs, tools=[] arguments
For each tool, extract from source:
- Name
- Input parameters with types and defaults
- Return type / output schema (document the structure, not just "returns a dict")
- Side effects: READ (no state change), WRITE (mutates state), GENERIC (neither)
- Validation rules the tool does NOT enforce (critical — grep for comments like "API does not check", "agent must enforce")

Pass 2 — Trace usage evidence:

Read ALL traces (if <= 20) or a stratified sample (see step 4 for sampling)
Extract every unique tool_calls[].name from assistant messages
Extract every role: "tool" response to document actual output shapes
For each tool, record one example input/output pair from traces

Reconcile the two passes:

Tools in source but NOT in traces = "available but unused" — flag these; they may be relevant for edge cases the agent should handle
Tools in traces but NOT in source = possible dynamic tools or external APIs — investigate

Output the full tool inventory as a table with columns: Name, Category, Input Schema, Output Schema, Observed in Traces (Y/N), Unvalidated Rules.

4. Find domain documentation

READMEs, product docs, wiki links
Policy files (e.g., data/*/policy.md, domain-specific docs)
Inline code comments explaining business logic
Test files that describe expected behavior
Anything that explains what the agent does and what "success" means for its users

5. Catalogue agent behavior patterns

Trace selection — stratified sampling (do not just grab "5-10 random traces"):

Count total traces in {TRACES_FOLDER}. If <= 20, read ALL of them.
If > 20, select a stratified sample:
- Sort by termination_reason — include at least 2 per unique reason
- Sort by conversation length (message count) — include shortest, longest, and 2 median
- Sort by tool call count — include lowest and highest
- If task outcomes are available (pass/fail), include at least 3 of each
- Target: ~15 traces total, or 30% of the corpus, whichever is larger

For each selected trace, document:

Function call frequency — which tools are called most, in what order
Tool call sequences — common tool chains (e.g., get_user -> get_reservation -> cancel)
Success patterns — what does a thread that accomplishes its goal look like?
Failure patterns — what does a thread that fails or gets stuck look like?
Error patterns — what error strings appear in tool outputs? Group by root cause
Policy violation patterns — where does the agent break its own rules? (e.g., multiple tool calls per turn, acting without confirmation)
User feedback signals — reverts, ratings, explicit corrections, escalations, stop tokens, transfer tokens

6. Write findings

Write all findings to eval/stage2_domain_context.md:

# Domain Context

## Trace Format
- Framework: [detected framework name]
- Key field paths: [e.g., simulation.messages[], info.environment_info.policy]

## Architecture
- Type: [single-agent | multi-agent]
- [If multi-agent: agent roster with roles and handoff triggers]

## Agent Purpose
[1-2 sentence summary of what this agent does]

## System Prompt
- **Source**: [file path + line, or trace field path, or NOT_FOUND]
- **Status**: [verbatim | reconstructed | not_found]

[The system prompt content, or "NOT_FOUND — downstream stages should account for missing system prompt"]

## Tools
| Tool | Category | Input Schema | Output Schema | In Traces? | Unvalidated Rules |
|------|----------|-------------|---------------|------------|-------------------|
| tool_name | READ/WRITE/GENERIC | `{param: type}` | `{field: type}` | Y/N | "API does not check X" |

### Tools available but never called in traces
- [tool_name — why it matters]

## Domain Rules
[Key business rules, constraints, policies the agent must follow]

## Behavior Patterns

### Success patterns
- [pattern 1]

### Failure patterns
- [pattern 1]

### Policy violation patterns
- [violation with frequency: N/M turns]

### Error patterns
| Error | Frequency | Root cause |
|-------|-----------|------------|
| error string | N traces | cause |

### User feedback signals
- [signal 1]

Outputs

eval/stage2_domain_context.md

Related Skills

kayba-ai/examples/openclaw/kayba-ace

development

VerifiedTrustedCommunity

# ACE — Learn from Traces This skill ships `learn_from_traces.py`, a script that reads OpenClaw session transcripts, feeds them through the ACE learning pipeline, and writes an updated skillbook to disk. ## Usage ```bash python learn_from_traces.py [OPTIONS] [FILES...] ``` The script auto-discovers new sessions from `~/.openclaw/agents/<agent>/sessions/` and only processes files that haven't been processed before. Processed filenames are tracked in `ace_processed.txt`. ## Options | Flag |

2,170SKILL.mdUpdated Apr 27, 2026

kayba-ai/examples/openclaw/kayba-ace

kayba-ai/kayba-stage-7-fixer

devops

VerifiedTrustedCommunity

Implement the approved fixes from the action plan and log all changes. Trigger when the user says "run stage 7", "implement fixes", "apply action plan", or when invoked by the kayba-pipeline orchestrator. Requires eval/action_plan.md to exist.

2,170SKILL.mdUpdated Apr 27, 2026

kayba-ai/kayba-stage-7-fixer

kayba-ai/kayba-stage-6-hitl

testing

VerifiedTrustedCommunity

Human-In-The-Loop gate that presents the action plan with full context, collects an informed approval/modification/rejection decision, and records the outcome. Trigger when the user says "run stage 6", "HITL review", "approve action plan", or when invoked by the kayba-pipeline orchestrator. Requires eval/action_plan.md and eval/baseline_metrics.md to exist.

2,170SKILL.mdUpdated Apr 27, 2026

kayba-ai/kayba-stage-6-hitl

kayba-ai/kayba-stage-5-action-plan

development

VerifiedTrustedCommunity

Triage each insight into discard/code-fix/prompt-fix and produce a prioritized action plan with specific recommendations. Trigger when the user says "run stage 5", "make action plan", "triage skills", or when invoked by the kayba-pipeline orchestrator. Requires eval outputs from stages 1-4.

2,170SKILL.mdUpdated Apr 27, 2026

kayba-ai/kayba-stage-5-action-plan

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/kayba-ai/agentic-context-engine.git

# Copy into Claude Code skills folder (global)
cp -r agentic-context-engine/ace/cli/skills/kayba-pipeline/stage-2-domain-context ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

kayba-ai/agentic-context-engine

2,170 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT