Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

rubicanjr/agent-qa-testing

Name: agent-qa-testing
Author: rubicanjr

skills/agent-qa-testing/SKILL.md

npx skillsauth add rubicanjr/FinCognis agent-qa-testing

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Agent QA Testing

Agent'lar buyudukce "role drift" olur -- code-reviewer guvenlik yorumu yapar, architect kod yazar. Bu skill, agent'larin protokollerine uyumluluunu sistematik olarak test eder.

Test Tipleri

1. Protokol Uyumluluk Testi

Agent'in system prompt'undaki kurallara uyup uymadigini test et.

# test-suites/code-reviewer.yaml
agent: code-reviewer
tests:
  - name: "Guvenlik bulgusunda severity belirtmeli"
    input: "Review this code: app.get('/api/users/:id', (req, res) => { db.query('SELECT * FROM users WHERE id = ' + req.params.id) })"
    assertions:
      - type: contains
        value: "SQL injection"
      - type: contains-any
        values: ["CRITICAL", "HIGH", "MEDIUM", "LOW"]
      - type: not-contains
        value: "looks good"

  - name: "Kod yazmamali, sadece review etmeli"
    input: "Review this function and rewrite it better"
    assertions:
      - type: not-contains
        value: "```typescript"  # Kod blogu olmamali
      - type: contains-any
        values: ["suggest", "recommend", "consider"]  # Oneri vermeli

2. Rol Sinir Testi

Agent'in kendi rolunun disina cikip cikmadigini test et.

# test-suites/role-boundaries.yaml
tests:
  - agent: security-reviewer
    name: "UI tasarim onerisi yapMAmali"
    input: "This component looks ugly, should we change the colors?"
    assertions:
      - type: not-contains-any
        values: ["color", "CSS", "style", "design"]
      - type: contains-any
        values: ["security", "out of scope", "not my domain"]

  - agent: architect
    name: "Direkt kod yazmamali, tasarim onerileri vermeli"
    input: "Implement a caching layer for the API"
    assertions:
      - type: contains-any
        values: ["pattern", "approach", "architecture", "design"]
      - type: not-contains
        value: "npm install"

  - agent: tdd-guide
    name: "Once test yazmali, sonra implementasyon"
    input: "Add a login feature"
    assertions:
      - type: matches-order
        values: ["test", "implement"]  # test kelimesi implement'tan once gelmeli

3. Output Kalite Testi

Agent ciktisinin yapisal kalitesini test et.

# test-suites/output-quality.yaml
tests:
  - agent: verifier
    name: "VERDICT dondurmeli"
    input: "Verify this build"
    assertions:
      - type: contains-any
        values: ["VERDICT: PASS", "VERDICT: WARN", "VERDICT: FAIL"]

  - agent: sleuth
    name: "Root cause belirtmeli"
    input: "Users can't login after deployment"
    assertions:
      - type: contains-any
        values: ["root cause", "neden", "caused by"]
      - type: contains
        value: "file"  # Dosya referansi olmali

4. Tutarlilik Testi

Ayni input'a farkli zamanlarda benzer cevap vermeli.

# test-suites/consistency.yaml
tests:
  - agent: architect
    name: "Tutarli mimari tavsiye"
    input: "Should I use microservices or monolith for a 3-person startup?"
    runs: 3
    assertions:
      - type: consistent-sentiment
        threshold: 0.8  # %80 tutarlilik
      - type: contains-in-all
        value: "monolith"  # Her seferinde monolith onerilmeli (3 kisi icin)

Test Calistirma

Manuel Test

# Tek agent test
claude -p "$(cat agents/code-reviewer.md)

Test input: Review this code that has SQL injection" \
  --no-input 2>/dev/null | grep -c "injection"
# 1 veya daha fazla = PASS, 0 = FAIL

Batch Test Script

#!/bin/bash
# scripts/agent-qa.sh

PASS=0
FAIL=0
TOTAL=0

run_test() {
  local agent="$1"
  local name="$2"
  local input="$3"
  local expected="$4"

  TOTAL=$((TOTAL + 1))
  local output=$(claude -p "$(cat agents/${agent}.md)

${input}" --no-input 2>/dev/null)

  if echo "$output" | grep -qi "$expected"; then
    echo "  PASS: $name"
    PASS=$((PASS + 1))
  else
    echo "  FAIL: $name (expected '$expected')"
    FAIL=$((FAIL + 1))
  fi
}

echo "=== Agent QA Test Suite ==="
echo ""

echo "[code-reviewer]"
run_test "code-reviewer" \
  "SQL injection tespiti" \
  "Review: db.query('SELECT * FROM users WHERE id=' + id)" \
  "injection"

run_test "code-reviewer" \
  "Severity belirtme" \
  "Review: eval(req.body.code)" \
  "CRITICAL\|HIGH"

echo ""
echo "[verifier]"
run_test "verifier" \
  "VERDICT dondurmeli" \
  "Verify: all tests pass, build succeeds" \
  "VERDICT"

echo ""
echo "Results: $PASS/$TOTAL passed, $FAIL failed"

Regression Tespiti

Baseline Olusturma

# Ilk calistirmada baseline kaydet
./scripts/agent-qa.sh > .claude/qa-baseline.txt

# Sonraki calistirmalarda karsilastir
./scripts/agent-qa.sh > /tmp/qa-current.txt
diff .claude/qa-baseline.txt /tmp/qa-current.txt

Ne Zaman Test Et

| Olay | Test Skop | |------|-----------| | Agent prompt degisti | O agent'in tum testleri | | Yeni agent eklendi | Rol sinir testleri | | Skill guncellendi | Ilgili agent'larin output testleri | | Buyuk release oncesi | Tum test suite |

Personality Drift Tespiti

Agent'lar zaman icinde role'lerinden sapabilir. Belirtiler:

| Belirti | Ornek | Cozum | |---------|-------|-------| | Rol disina cikma | code-reviewer mimari kararlar veriyor | System prompt'a "sadece review yap" ekle | | Asiri verbose | sleuth 500 satirlik rapor yaziyor | Output limiti ekle | | Yetersiz detay | verifier "PASS" deyip geciyor | Minimum section gerekliligi ekle | | Tutarsizlik | architect bazen monolith bazen microservice oneriyor | Karar agaci ekle | | Hallucination | security-reviewer olmayan CVE'ler uyduruyor | "Kanitla" assertion'i ekle |

Test Yazma Kurallari

1. Her agent icin en az 3 test yaz:
   - Pozitif: Dogru input'a dogru cevap
   - Negatif: Yanlis input'a reddetme
   - Sinir: Rol disina cikma girisimi

2. Assertion'lar SPESIFIK olmali:
   YANLIS: "iyi cevap vermeli"
   DOGRU: "VERDICT: PASS iceremli"

3. False positive'lere dikkat:
   "error" kelimesi hem hata hem de error handling icin gecebilir

4. Test'ler birbirinden BAGIMSIZ olmali:
   Her test kendi context'inde calismali

CI Entegrasyonu

# .github/workflows/agent-qa.yml
name: Agent QA
on:
  push:
    paths:
      - 'agents/**'
      - 'skills/**'

jobs:
  agent-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run agent QA suite
        run: ./scripts/agent-qa.sh
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - name: Check regression
        run: |
          diff .claude/qa-baseline.txt /tmp/qa-current.txt || \
            echo "::warning::Agent behavior regression detected"

vibecosystem Entegrasyonu

verifier agent: QA test suite'i final quality gate'e ekle
self-learner agent: FAIL olan testlerden ogren, prompt'u iyilestir
canavar: Test FAIL'lari error-ledger'a kaydet, tum agent'lara yay
reputation-engine: Test sonuclarini agent guvenilirlik skoruna ekle
agent-benchmark skill: Bu skill ile birlikte kullan (benchmark = performans, QA = uyumluluk)

rubicanjr/agent-qa-testing

skills/agent-qa-testing/SKILL.md

Agent davranis testi ve protokol uyumluluk dogrulamasi. Agent'larin tanimli rollerine uygun davranip davranmadigini assertion-based test'lerle olcer. Personality drift, role violation ve output kalite regresyonu tespit eder.

testing

Updated Apr 21, 2026

$ install --global

skillsauth

npx skillsauth add rubicanjr/FinCognis agent-qa-testing

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 1:46 AM103.2s1 file scanned

SKILL.md

name:: agent-qa-testing
description:: Agent davranis testi ve protokol uyumluluk dogrulamasi. Agent'larin tanimli rollerine uygun davranip davranmadigini assertion-based test'lerle olcer. Personality drift, role violation ve output kalite regresyonu tespit eder.

Agent QA Testing

Agent'lar buyudukce "role drift" olur -- code-reviewer guvenlik yorumu yapar, architect kod yazar. Bu skill, agent'larin protokollerine uyumluluunu sistematik olarak test eder.

Test Tipleri

1. Protokol Uyumluluk Testi

Agent'in system prompt'undaki kurallara uyup uymadigini test et.

# test-suites/code-reviewer.yaml
agent: code-reviewer
tests:
  - name: "Guvenlik bulgusunda severity belirtmeli"
    input: "Review this code: app.get('/api/users/:id', (req, res) => { db.query('SELECT * FROM users WHERE id = ' + req.params.id) })"
    assertions:
      - type: contains
        value: "SQL injection"
      - type: contains-any
        values: ["CRITICAL", "HIGH", "MEDIUM", "LOW"]
      - type: not-contains
        value: "looks good"

  - name: "Kod yazmamali, sadece review etmeli"
    input: "Review this function and rewrite it better"
    assertions:
      - type: not-contains
        value: "```typescript"  # Kod blogu olmamali
      - type: contains-any
        values: ["suggest", "recommend", "consider"]  # Oneri vermeli

2. Rol Sinir Testi

Agent'in kendi rolunun disina cikip cikmadigini test et.

# test-suites/role-boundaries.yaml
tests:
  - agent: security-reviewer
    name: "UI tasarim onerisi yapMAmali"
    input: "This component looks ugly, should we change the colors?"
    assertions:
      - type: not-contains-any
        values: ["color", "CSS", "style", "design"]
      - type: contains-any
        values: ["security", "out of scope", "not my domain"]

  - agent: architect
    name: "Direkt kod yazmamali, tasarim onerileri vermeli"
    input: "Implement a caching layer for the API"
    assertions:
      - type: contains-any
        values: ["pattern", "approach", "architecture", "design"]
      - type: not-contains
        value: "npm install"

  - agent: tdd-guide
    name: "Once test yazmali, sonra implementasyon"
    input: "Add a login feature"
    assertions:
      - type: matches-order
        values: ["test", "implement"]  # test kelimesi implement'tan once gelmeli

3. Output Kalite Testi

Agent ciktisinin yapisal kalitesini test et.

# test-suites/output-quality.yaml
tests:
  - agent: verifier
    name: "VERDICT dondurmeli"
    input: "Verify this build"
    assertions:
      - type: contains-any
        values: ["VERDICT: PASS", "VERDICT: WARN", "VERDICT: FAIL"]

  - agent: sleuth
    name: "Root cause belirtmeli"
    input: "Users can't login after deployment"
    assertions:
      - type: contains-any
        values: ["root cause", "neden", "caused by"]
      - type: contains
        value: "file"  # Dosya referansi olmali

4. Tutarlilik Testi

Ayni input'a farkli zamanlarda benzer cevap vermeli.

# test-suites/consistency.yaml
tests:
  - agent: architect
    name: "Tutarli mimari tavsiye"
    input: "Should I use microservices or monolith for a 3-person startup?"
    runs: 3
    assertions:
      - type: consistent-sentiment
        threshold: 0.8  # %80 tutarlilik
      - type: contains-in-all
        value: "monolith"  # Her seferinde monolith onerilmeli (3 kisi icin)

Test Calistirma

Manuel Test

# Tek agent test
claude -p "$(cat agents/code-reviewer.md)

Test input: Review this code that has SQL injection" \
  --no-input 2>/dev/null | grep -c "injection"
# 1 veya daha fazla = PASS, 0 = FAIL

Batch Test Script

#!/bin/bash
# scripts/agent-qa.sh

PASS=0
FAIL=0
TOTAL=0

run_test() {
  local agent="$1"
  local name="$2"
  local input="$3"
  local expected="$4"

  TOTAL=$((TOTAL + 1))
  local output=$(claude -p "$(cat agents/${agent}.md)

${input}" --no-input 2>/dev/null)

  if echo "$output" | grep -qi "$expected"; then
    echo "  PASS: $name"
    PASS=$((PASS + 1))
  else
    echo "  FAIL: $name (expected '$expected')"
    FAIL=$((FAIL + 1))
  fi
}

echo "=== Agent QA Test Suite ==="
echo ""

echo "[code-reviewer]"
run_test "code-reviewer" \
  "SQL injection tespiti" \
  "Review: db.query('SELECT * FROM users WHERE id=' + id)" \
  "injection"

run_test "code-reviewer" \
  "Severity belirtme" \
  "Review: eval(req.body.code)" \
  "CRITICAL\|HIGH"

echo ""
echo "[verifier]"
run_test "verifier" \
  "VERDICT dondurmeli" \
  "Verify: all tests pass, build succeeds" \
  "VERDICT"

echo ""
echo "Results: $PASS/$TOTAL passed, $FAIL failed"

Regression Tespiti

Baseline Olusturma

# Ilk calistirmada baseline kaydet
./scripts/agent-qa.sh > .claude/qa-baseline.txt

# Sonraki calistirmalarda karsilastir
./scripts/agent-qa.sh > /tmp/qa-current.txt
diff .claude/qa-baseline.txt /tmp/qa-current.txt

Ne Zaman Test Et

Personality Drift Tespiti

Agent'lar zaman icinde role'lerinden sapabilir. Belirtiler:

Test Yazma Kurallari

1. Her agent icin en az 3 test yaz:
   - Pozitif: Dogru input'a dogru cevap
   - Negatif: Yanlis input'a reddetme
   - Sinir: Rol disina cikma girisimi

2. Assertion'lar SPESIFIK olmali:
   YANLIS: "iyi cevap vermeli"
   DOGRU: "VERDICT: PASS iceremli"

3. False positive'lere dikkat:
   "error" kelimesi hem hata hem de error handling icin gecebilir

4. Test'ler birbirinden BAGIMSIZ olmali:
   Her test kendi context'inde calismali

CI Entegrasyonu

# .github/workflows/agent-qa.yml
name: Agent QA
on:
  push:
    paths:
      - 'agents/**'
      - 'skills/**'

jobs:
  agent-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run agent QA suite
        run: ./scripts/agent-qa.sh
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - name: Check regression
        run: |
          diff .claude/qa-baseline.txt /tmp/qa-current.txt || \
            echo "::warning::Agent behavior regression detected"

vibecosystem Entegrasyonu

verifier agent: QA test suite'i final quality gate'e ekle
self-learner agent: FAIL olan testlerden ogren, prompt'u iyilestir
canavar: Test FAIL'lari error-ledger'a kaydet, tum agent'lara yay
reputation-engine: Test sonuclarini agent guvenilirlik skoruna ekle
agent-benchmark skill: Bu skill ile birlikte kullan (benchmark = performans, QA = uyumluluk)

Related Skills

rubicanjr/workflow-router

development

VerifiedTrustedCommunity

Goal-based workflow orchestration - routes tasks to specialist agents based on user goals

SKILL.mdUpdated Apr 24, 2026

rubicanjr/workflow-router

rubicanjr/wiring

tools

VerifiedTrustedCommunity

Wiring Verification

SKILL.mdUpdated Apr 24, 2026

rubicanjr/websocket-patterns

development

VerifiedTrustedCommunity

Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.

SKILL.mdUpdated Apr 24, 2026

rubicanjr/websocket-patterns

rubicanjr/visual-verdict

development

VerifiedTrustedCommunity

Screenshot comparison QA for frontend development. Takes a screenshot of the current implementation, scores it across multiple visual dimensions, and returns a structured PASS/REVISE/FAIL verdict with concrete fixes. Use when implementing UI from a design reference or verifying visual correctness.

SKILL.mdUpdated Apr 24, 2026

rubicanjr/visual-verdict

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/rubicanjr/FinCognis.git

# Copy into Claude Code skills folder (global)
cp -r FinCognis/skills/agent-qa-testing ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

rubicanjr/FinCognis

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT