Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

arthur-ai/integrations/claude-code-skills/arthur-onboard/arthur-onboard-evals

Name: integrations/claude-code-skills/arthur-onboard/arthur-onboard-evals
Author: arthur-ai

integrations/claude-code-skills/arthur-onboard/arthur-onboard-evals/SKILL.md

npx skillsauth add arthur-ai/arthur-engine integrations/claude-code-skills/arthur-onboard/arthur-onboard-evals

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Arthur Onboard — Step 9: Recommend & Configure Continuous Evals

Read State

cat .arthur-engine.env 2>/dev/null || echo "(no state file)"

Parse ARTHUR_ENGINE_URL, ARTHUR_API_KEY, ARTHUR_TASK_ID, ARTHUR_EVAL_PROVIDER, ARTHUR_EVAL_MODEL.

Skip if ARTHUR_EVAL_PROVIDER is empty or none — no eval model was configured in Step 8. Exit this skill.

Check Existing Evals

curl -s \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  "$ARTHUR_ENGINE_URL/api/v1/tasks/$ARTHUR_TASK_ID/continuous_evals?page=0&page_size=100" | \
  python3 -c "
import sys, json
d = json.load(sys.stdin)
evals = d.get('evals', [])
for e in evals:
    print(f'  • {e[\"name\"]}')
print(f'COUNT={len(evals)}')
"

If evals already exist, show them and ask if the user wants additional recommendations.

Fetch a Trace for Analysis

TRACE_RESPONSE=$(curl -s \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  "$ARTHUR_ENGINE_URL/api/v1/traces?task_ids=$ARTHUR_TASK_ID&page_size=5")

TRACE_ID=$(echo "$TRACE_RESPONSE" | \
  python3 -c "
import sys, json
d = json.load(sys.stdin)
traces = d.get('traces') or d.get('data') or []
print(traces[0]['trace_id'] if traces else '')
" 2>/dev/null)

if [ -n "$TRACE_ID" ]; then
  curl -s \
    -H "Authorization: Bearer $ARTHUR_API_KEY" \
    "$ARTHUR_ENGINE_URL/api/v1/traces/$TRACE_ID"
fi

Recommend Evals

Using the trace input/output content, recommend 2-4 continuous evals. Choose based on what the trace reveals about the application:

| Application type | Good eval choices | |-----------------|-------------------| | Customer support | Relevance, tone, task completion | | RAG / Q&A | Faithfulness (only if retrieval spans found), relevance | | Code assistant | Correctness, clarity | | General chat | Relevance, toxicity | | Agentic workflows | Task completion, relevance |

Important: Only recommend a faithfulness/hallucination eval if the trace contains retrieval spans (span_kind=RETRIEVER or span name matches retrieve/search/document/rag).

For each recommendation, write specific Jinja2-format instructions:

Evaluate the following LLM interaction for <quality dimension>:

Input: {{ input }}
Output: {{ output }}

<scoring criteria specific to the dimension>

Return a JSON object: {"score": <0.0-1.0 where 1.0 = fully passes>, "reason": "<brief explanation>"}

Show the recommendations with rationale and ask for user approval.

Create Evals (If Approved)

For each recommended eval, create the LLM eval rule:

curl -s -X POST \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"model_name\": \"$EVAL_MODEL\", \"model_provider\": \"$EVAL_PROVIDER\", \"instructions\": \"$INSTRUCTIONS\"}" \
  "$ARTHUR_ENGINE_URL/api/v1/tasks/$ARTHUR_TASK_ID/llm_evals/$SLUG"

After all LLM evals are created, create one shared transform:

TRANSFORM_RESPONSE=$(curl -s -X POST \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"name\": \"Arthur — Input/Output Extractor\",
    \"definition\": {
      \"variables\": [
        {\"variable_name\": \"input\", \"span_name\": \"$ROOT_SPAN_NAME\", \"attribute_path\": \"attributes.input.value\", \"fallback\": \"\"},
        {\"variable_name\": \"output\", \"span_name\": \"$ROOT_SPAN_NAME\", \"attribute_path\": \"attributes.output.value\", \"fallback\": \"\"}
      ]
    }
  }" \
  "$ARTHUR_ENGINE_URL/api/v1/tasks/$ARTHUR_TASK_ID/traces/transforms")

TRANSFORM_ID=$(echo "$TRANSFORM_RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null)

Then create a continuous eval for each LLM eval:

curl -s -X POST \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"name\": \"$DISPLAY_NAME\",
    \"llm_eval_name\": \"$SLUG\",
    \"llm_eval_version\": \"latest\",
    \"transform_id\": \"$TRANSFORM_ID\",
    \"transform_variable_mapping\": [
      {\"transform_variable\": \"input\", \"eval_variable\": \"input\"},
      {\"transform_variable\": \"output\", \"eval_variable\": \"output\"}
    ],
    \"enabled\": true
  }" \
  "$ARTHUR_ENGINE_URL/api/v1/tasks/$ARTHUR_TASK_ID/continuous_evals"

This step is non-blocking — log a warning and continue if it errors.

arthur-ai/integrations/claude-code-skills/arthur-onboard/arthur-onboard-evals

integrations/claude-code-skills/arthur-onboard/arthur-onboard-evals/SKILL.md

--- name: arthur-onboard-evals description: Arthur onboarding sub-skill — Step 9: Recommend and configure continuous LLM evals for the Arthur task. Reads credentials and eval provider from .arthur-engine.env. allowed-tools: Bash, Read --- # Arthur Onboard — Step 9: Recommend & Configure Continuous Evals ## Read State ```bash cat .arthur-engine.env 2>/dev/null || echo "(no state file)" ``` Parse `ARTHUR_ENGINE_URL`, `ARTHUR_API_KEY`, `ARTHUR_TASK_ID`, `ARTHUR_EVAL_PROVIDER`, `ARTHUR_EVAL_MODE

81 stars

tools

Updated May 28, 2026

$ install --global

skillsauth

npx skillsauth add arthur-ai/arthur-engine integrations/claude-code-skills/arthur-onboard/arthur-onboard-evals

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 28, 2026, 3:43 AM272.6s1 file scanned

SKILL.md

name:: arthur-onboard-evals
description:: Arthur onboarding sub-skill — Step 9: Recommend and configure continuous LLM evals for the Arthur task. Reads credentials and eval provider from .arthur-engine.env.
allowed-tools:: Bash, Read

Arthur Onboard — Step 9: Recommend & Configure Continuous Evals

Read State

cat .arthur-engine.env 2>/dev/null || echo "(no state file)"

Parse ARTHUR_ENGINE_URL, ARTHUR_API_KEY, ARTHUR_TASK_ID, ARTHUR_EVAL_PROVIDER, ARTHUR_EVAL_MODEL.

Skip if ARTHUR_EVAL_PROVIDER is empty or none — no eval model was configured in Step 8. Exit this skill.

Check Existing Evals

curl -s \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  "$ARTHUR_ENGINE_URL/api/v1/tasks/$ARTHUR_TASK_ID/continuous_evals?page=0&page_size=100" | \
  python3 -c "
import sys, json
d = json.load(sys.stdin)
evals = d.get('evals', [])
for e in evals:
    print(f'  • {e[\"name\"]}')
print(f'COUNT={len(evals)}')
"

If evals already exist, show them and ask if the user wants additional recommendations.

Fetch a Trace for Analysis

TRACE_RESPONSE=$(curl -s \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  "$ARTHUR_ENGINE_URL/api/v1/traces?task_ids=$ARTHUR_TASK_ID&page_size=5")

TRACE_ID=$(echo "$TRACE_RESPONSE" | \
  python3 -c "
import sys, json
d = json.load(sys.stdin)
traces = d.get('traces') or d.get('data') or []
print(traces[0]['trace_id'] if traces else '')
" 2>/dev/null)

if [ -n "$TRACE_ID" ]; then
  curl -s \
    -H "Authorization: Bearer $ARTHUR_API_KEY" \
    "$ARTHUR_ENGINE_URL/api/v1/traces/$TRACE_ID"
fi

Recommend Evals

Using the trace input/output content, recommend 2-4 continuous evals. Choose based on what the trace reveals about the application:

Important: Only recommend a faithfulness/hallucination eval if the trace contains retrieval spans (span_kind=RETRIEVER or span name matches retrieve/search/document/rag).

For each recommendation, write specific Jinja2-format instructions:

Evaluate the following LLM interaction for <quality dimension>:

Input: {{ input }}
Output: {{ output }}

<scoring criteria specific to the dimension>

Return a JSON object: {"score": <0.0-1.0 where 1.0 = fully passes>, "reason": "<brief explanation>"}

Show the recommendations with rationale and ask for user approval.

Create Evals (If Approved)

For each recommended eval, create the LLM eval rule:

curl -s -X POST \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"model_name\": \"$EVAL_MODEL\", \"model_provider\": \"$EVAL_PROVIDER\", \"instructions\": \"$INSTRUCTIONS\"}" \
  "$ARTHUR_ENGINE_URL/api/v1/tasks/$ARTHUR_TASK_ID/llm_evals/$SLUG"

After all LLM evals are created, create one shared transform:

TRANSFORM_RESPONSE=$(curl -s -X POST \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"name\": \"Arthur — Input/Output Extractor\",
    \"definition\": {
      \"variables\": [
        {\"variable_name\": \"input\", \"span_name\": \"$ROOT_SPAN_NAME\", \"attribute_path\": \"attributes.input.value\", \"fallback\": \"\"},
        {\"variable_name\": \"output\", \"span_name\": \"$ROOT_SPAN_NAME\", \"attribute_path\": \"attributes.output.value\", \"fallback\": \"\"}
      ]
    }
  }" \
  "$ARTHUR_ENGINE_URL/api/v1/tasks/$ARTHUR_TASK_ID/traces/transforms")

TRANSFORM_ID=$(echo "$TRANSFORM_RESPONSE" | python3 -c "import sys,json; print(json.load(sys.stdin).get('id',''))" 2>/dev/null)

Then create a continuous eval for each LLM eval:

curl -s -X POST \
  -H "Authorization: Bearer $ARTHUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"name\": \"$DISPLAY_NAME\",
    \"llm_eval_name\": \"$SLUG\",
    \"llm_eval_version\": \"latest\",
    \"transform_id\": \"$TRANSFORM_ID\",
    \"transform_variable_mapping\": [
      {\"transform_variable\": \"input\", \"eval_variable\": \"input\"},
      {\"transform_variable\": \"output\", \"eval_variable\": \"output\"}
    ],
    \"enabled\": true
  }" \
  "$ARTHUR_ENGINE_URL/api/v1/tasks/$ARTHUR_TASK_ID/continuous_evals"

This step is non-blocking — log a warning and continue if it errors.

Related Skills

arthur-ai/integrations/claude-code-skills/arthur-onboard/arthur-onboard-verify

tools

VerifiedTrustedCommunity

--- name: arthur-onboard-verify description: Arthur onboarding sub-skill — Step 7: Verify that traces are flowing from the instrumented application to Arthur Engine. Reads credentials from .arthur-engine.env. allowed-tools: Bash, Read --- # Arthur Onboard — Step 7: Verify Instrumentation ## Read State ```bash cat .arthur-engine.env 2>/dev/null || echo "(no state file)" ``` Parse `ARTHUR_ENGINE_URL`, `ARTHUR_API_KEY`, `ARTHUR_TASK_ID`. --- ## Tell the User to Run Their Application Show the

81SKILL.mdUpdated May 28, 2026

arthur-ai/integrations/claude-code-skills/arthur-onboard/arthur-onboard-verify

arthur-ai/integrations/claude-code-skills/arthur-onboard/arthur-onboard-task

tools

VerifiedTrustedCommunity

--- name: arthur-onboard-task description: Arthur onboarding sub-skill — Step 3: Set up an Arthur Task (create or select). Reads/writes .arthur-engine.env. allowed-tools: Bash, Read, Write, Edit --- # Arthur Onboard — Step 3: Set Up Arthur Task **Goal:** Establish `ARTHUR_TASK_ID` in `.arthur-engine.env`. ## Read State ```bash cat .arthur-engine.env 2>/dev/null || echo "(no state file)" ``` Parse `ARTHUR_ENGINE_URL`, `ARTHUR_API_KEY`, and `ARTHUR_TASK_ID` from the output. **State write hel

81SKILL.mdUpdated May 28, 2026

arthur-ai/integrations/claude-code-skills/arthur-onboard/arthur-onboard-task

arthur-ai/integrations/claude-code-skills/arthur-onboard/arthur-onboard-prompts

tools

VerifiedTrustedCommunity

--- name: arthur-onboard-prompts description: Arthur onboarding sub-skill — Step 6: Extract prompts from the target repository and register them with Arthur Engine. Reads credentials from .arthur-engine.env. allowed-tools: Bash, Read, Task --- # Arthur Onboard — Step 6: Extract & Register Prompts ## Read State ```bash cat .arthur-engine.env 2>/dev/null || echo "(no state file)" ``` Parse `ARTHUR_ENGINE_URL`, `ARTHUR_API_KEY`, `ARTHUR_TASK_ID`. --- ## Extract Prompts via Sub-agent Delegate

81SKILL.mdUpdated May 28, 2026

arthur-ai/integrations/claude-code-skills/arthur-onboard/arthur-onboard-prompts

arthur-ai/arthur-onboard-platform

development

VerifiedTrustedCommunity

Onboard an agentic application to the Arthur SaaS Platform (platform.arthur.ai). Guides through authentication, workspace selection, engine deployment, model creation, code instrumentation, trace verification, and eval configuration.

81SKILL.mdUpdated May 28, 2026

arthur-ai/arthur-onboard-platform

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/arthur-ai/arthur-engine.git

# Copy into Claude Code skills folder (global)
cp -r arthur-engine/integrations/claude-code-skills/arthur-onboard/arthur-onboard-evals ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

arthur-ai/arthur-engine

81 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT