skills/project-agno-rx/SKILL.md
Specialist evaluation for Agno AI agent projects. Evaluates agent design, tool usage, knowledge/RAG setup, memory management, team coordination, workflow orchestration, deployment readiness, and observability against Agno best practices. Use when building with Agno, auditing agent quality, or when the user says "agno audit", "run project-agno-rx", "evaluate my agents", "agno best practices", or "agent quality check". Measures 10 dimensions (40 sub-metrics) specific to the Agno framework.
npx skillsauth add acardozzo/rx-suite project-agno-rxInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Optional: pyright for type-aware analysis (pip install pyright)
Check all dependencies: bash scripts/rx-deps.sh or bash scripts/rx-deps.sh --install
"Is this Agno project using the framework correctly and leveraging all available capabilities for a world-class A+ agentic system?"
This is a SPECIALIST skill for projects built with the Agno AI framework (https://github.com/agno-agi/agno). Unlike general-purpose rx skills, every metric here references specific Agno classes, methods, and patterns.
agno imports, Agent class usage, pyproject.toml with agno dependency)Before scoring, confirm the project is Agno-based:
1. Check pyproject.toml / requirements.txt / setup.py for `agno` dependency
2. Scan for `from agno.agent import Agent` or similar Agno imports
3. Look for Agent() instantiation patterns
4. Check for agno config files, AGENTS.md, .cursorrules referencing Agno
If Agno is not detected, abort with: "This project does not appear to use the Agno framework. Use project-rx for general project evaluation instead."
| Dimension | Weight | |-----------|--------| | D1: Agent Design & Configuration | 12% | | D2: Tool Integration | 12% | | D3: Knowledge & RAG | 10% | | D4: Memory & Learning | 10% | | D5: Team & Multi-Agent | 10% | | D6: Workflow Orchestration | 8% | | D7: Model & Provider Management | 10% | | D8: Safety & Guardrails | 10% | | D9: Deployment & Runtime | 10% | | D10: Testing & Evaluation | 8% |
Deploy 5 parallel agents to scan the codebase:
| Agent | Dimensions | What to Scan | |-------|-----------|--------------| | Agent 1: Agent Design & Tools | D1 + D2 | Agent() definitions, model config, instructions, output_schema, tool imports, @tool decorators, Toolkit subclasses | | Agent 2: Knowledge & Memory | D3 + D4 | VectorDb config, knowledge bases, chunking, embedders, MemoryManager setup, memory types, learning patterns | | Agent 3: Teams & Workflows | D5 + D6 | Team() definitions, coordination modes, Workflow() classes, Step/Steps/Parallel/Loop usage, state management | | Agent 4: Models & Safety | D7 + D8 | Model provider usage, fallback patterns, streaming config, guardrail definitions, input/output validation, rate limiting | | Agent 5: Deploy & Test | D9 + D10 | AgentOS/FastAPI setup, database config, env vars, tracing/observability, test files, eval framework, CI/CD |
For every sub-metric in their assigned dimensions:
NOT_PRESENT (0) -- No evidence whatsoeverMINIMAL (40) -- Placeholder, stub, or extremely partialBASIC (70) -- Works for MVP, missing advanced featuresPRODUCTION (85) -- Solid, covers main use casesWORLD_CLASS (100) -- Fully featured, best-in-classD1: Agent Design & Configuration -- Source: libs/agno/agno/agent/agent.py, AGENTS.md
Agent(, name=, description=, instructions=, model=, markdown=Trueoutput_schema=, response_model=, Pydantic model classes used with agentsnum_history_runs=, add_history_to_messages=, session_id=, storage=D2: Tool Integration -- Source: libs/agno/agno/tools/toolkit.py, built-in tools
from agno.tools imports, using built-in (DuckDuckGoTools, YFinanceTools, etc.) vs customcache_results=, show_result=, confirm=, tool-level settings@tool decorator usage, docstrings, type hints, error handling in custom toolsregister() method usage, factory functions for dynamic tool setsD3: Knowledge & RAG -- Source: libs/agno/agno/knowledge/protocol.py, vector DB integrations
PgVector, Pinecone, Qdrant, Weaviate, LanceDb (not in-memory for prod)SemanticChunking, RecursiveChunking, AgenticChunking, FixedSizeChunkingOpenAIEmbedder, OllamaEmbedder, dimension settings, model selectionhybrid_search=True, reranker=, search_type=, limit=, score_threshold=D4: Memory & Learning -- Source: libs/agno/agno/memory/manager.py
MemoryManager(, db=, PostgreSQL/SQLite backendcreate_user_memories=True, create_session_summary=True, entity memoriesupdate_user_memories_after_run=, summarization config, memory cleanuplearnings= context injectionD5: Team & Multi-Agent -- Source: libs/agno/agno/team/team.py
Team(, mode="broadcast"/"router"/"coordinate", appropriate mode for use casename, role, instructions, toolsshared_memory=, distributed knowledge, session context sharinginstructions, delegation rules, enable_agentic_context=TrueD6: Workflow Orchestration -- Source: libs/agno/agno/workflow/workflow.py
Workflow(, Step, Steps, Parallel, Loop, Condition, Routeron_error=, retry logic, OnError handlers, fallback stepsinput_required=, approval gates at stepssession_state, state passing between steps, storage= for persistenceD7: Model & Provider Management -- Source: Agno Models system
"openai:gpt-4" string format or OpenAIResponses(id=...), not raw API callsmodel_fallback=stream=True, stream_intermediate_steps=True, response format configD8: Safety & Guardrails -- Source: libs/agno/agno/guardrails/
input_guardrails=, prompt injection detection, PII filteringoutput_guardrails=, content moderation, response validationconfirm=True on dangerous tools, approval workflows, tool-level access controlmax_tokens=D9: Deployment & Runtime -- Source: libs/agno/agno/os/app.py, FastAPI patterns
AgnoApi(, FastAPI integration, proper routing, health endpointsos.getenv(), .env files, no hardcoded API keys/credentialsmonitoring=True, Langfuse integration, OpenTelemetry, structured loggingD10: Testing & Evaluation -- Source: libs/agno/agno/eval/, cookbook patterns
pytest files testing agent behavior, mock tools, response validationEval, AccuracyEval, ReliabilityEval, LLM-as-judge scoring./scripts/format.sh, ./scripts/validate.sh, pre-commit| Score | Level | Meaning | |-------|-------|---------| | 100 | World-class | Fully leverages Agno capabilities, best-in-class patterns | | 85 | Production-ready | Solid Agno usage, covers main use cases properly | | 70 | Basic / MVP | Uses Agno correctly for basics, missing advanced features | | 40 | Minimal | Agno imported but barely configured, default everything | | 0 | Not present | Feature not used at all |
Each dimension has 4 sub-metrics, equally weighted (25% each within the dimension):
dimension_score = (M_x.1 + M_x.2 + M_x.3 + M_x.4) / 4
overall_score = sum(dimension_score_i * weight_i) for i in 1..10
| Grade | Range | Meaning | |-------|-------|---------| | A+ | 97-100 | Exemplary Agno usage | | A | 93-96 | Excellent, near-complete framework leverage | | A- | 90-92 | Very strong | | B+ | 87-89 | Strong | | B | 83-86 | Good framework usage | | B- | 80-82 | Above average | | C+ | 77-79 | Fair | | C | 73-76 | Adequate for early stage | | C- | 70-72 | Below expectations | | D+ | 67-69 | Significant gaps | | D | 63-66 | Many missed Agno capabilities | | D- | 60-62 | Barely using the framework | | F | 0-59 | Critical -- not leveraging Agno properly |
================================================================
PROJECT-AGNO-RX QUALITY SCORECARD
Project: {project_name}
Framework: Agno {version}
Date: {date}
================================================================
OVERALL SCORE: {score}/100 ({grade})
┌─────────────────────────────────────────┬───────┬───────┐
│ Dimension │ Score │ Grade │
├─────────────────────────────────────────┼───────┼───────┤
│ D1: Agent Design & Config (12%) │ {s} │ {g} │
│ M1.1 Agent Definition Quality │ {s} │ │
│ M1.2 Structured I/O │ {s} │ │
│ M1.3 Context Management │ {s} │ │
│ M1.4 Agent Reuse │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D2: Tool Integration (12%) │ {s} │ {g} │
│ M2.1 Tool Selection │ {s} │ │
│ M2.2 Tool Configuration │ {s} │ │
│ M2.3 Custom Tool Quality │ {s} │ │
│ M2.4 Tool Composition │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D3: Knowledge & RAG (10%) │ {s} │ {g} │
│ M3.1 Vector Store Setup │ {s} │ │
│ M3.2 Chunking Strategy │ {s} │ │
│ M3.3 Embedding Configuration │ {s} │ │
│ M3.4 Search Quality │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D4: Memory & Learning (10%) │ {s} │ {g} │
│ M4.1 Memory Manager Setup │ {s} │ │
│ M4.2 Memory Types Used │ {s} │ │
│ M4.3 Memory Optimization │ {s} │ │
│ M4.4 Learning Integration │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D5: Team & Multi-Agent (10%) │ {s} │ {g} │
│ M5.1 Team Design │ {s} │ │
│ M5.2 Member Specialization │ {s} │ │
│ M5.3 Shared Resources │ {s} │ │
│ M5.4 Coordination Quality │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D6: Workflow Orchestration (8%) │ {s} │ {g} │
│ M6.1 Workflow Structure │ {s} │ │
│ M6.2 Error Handling │ {s} │ │
│ M6.3 Human-in-the-Loop │ {s} │ │
│ M6.4 State Management │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D7: Model & Provider Mgmt (10%) │ {s} │ {g} │
│ M7.1 Model Selection │ {s} │ │
│ M7.2 Provider Abstraction │ {s} │ │
│ M7.3 Fallback & Redundancy │ {s} │ │
│ M7.4 Streaming & Performance │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D8: Safety & Guardrails (10%) │ {s} │ {g} │
│ M8.1 Input Guardrails │ {s} │ │
│ M8.2 Output Guardrails │ {s} │ │
│ M8.3 Tool Guardrails │ {s} │ │
│ M8.4 Rate Limiting & Cost │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D9: Deployment & Runtime (10%) │ {s} │ {g} │
│ M9.1 AgentOS Setup │ {s} │ │
│ M9.2 Database Configuration │ {s} │ │
│ M9.3 Environment Configuration │ {s} │ │
│ M9.4 Observability │ {s} │ │
├─────────────────────────────────────────┼───────┼───────┤
│ D10: Testing & Evaluation (8%) │ {s} │ {g} │
│ M10.1 Agent Tests │ {s} │ │
│ M10.2 Evaluation Framework │ {s} │ │
│ M10.3 Cookbook / Examples │ {s} │ │
│ M10.4 CI Integration │ {s} │ │
└─────────────────────────────────────────┴───────┴───────┘
For every sub-metric scoring below 85, output:
================================================================
IMPROVEMENT PLAN
================================================================
Priority Legend: [BLOCKER] [CRITICAL] [HIGH] [MEDIUM] [LOW]
Effort Legend: S (< 1 day) M (1-3 days) L (3-7 days) XL (1-2 weeks)
┌────┬───────────────────────────────┬──────────┬────────┬─────────────────────────────────────────────┐
│ # │ Sub-Metric │ Priority │ Effort │ Agno-Native Solution │
├────┼───────────────────────────────┼──────────┼────────┼─────────────────────────────────────────────┤
│ 1 │ M3.1 Vector Store Setup │ CRITICAL │ M │ Switch to PgVector2 with pgvector extension │
│ 2 │ M8.1 Input Guardrails │ HIGH │ S │ Add input_guardrails=[ModerateInput()] │
│ ...│ │ │ │ │
└────┴───────────────────────────────┴──────────┴────────┴─────────────────────────────────────────────┘
| Condition | Priority | |-----------|----------| | Score 0 in a dimension weighted >= 12% | BLOCKER | | Score 0 in a dimension weighted >= 10% | CRITICAL | | Score 40 in a dimension weighted >= 10% | HIGH | | Score 40 in a dimension weighted >= 8% | MEDIUM | | Score 70 in any dimension | LOW |
Sort items by:
After the scorecard, provide:
List the top 3-5 improvements that can be done in under a day using Agno built-in features.
Group larger improvements into logical phases (e.g., "Add RAG pipeline", "Implement team coordination", "Production hardening").
List Agno capabilities the project is not using at all that could add value.
Flag any patterns that contradict Agno best practices (agents in loops, raw API calls instead of model abstraction, in-memory storage in production).
Always confirm Agno usage first. If the project does not use Agno, abort immediately. Do not force-fit this evaluation on non-Agno projects.
Evidence-based scoring only. Every score must cite specific files, imports, and code patterns. No guessing. If uncertain, score lower and note the uncertainty.
Score 0 means the feature is not used. The Agno capability exists but the project does not use it at all. Document what you searched for.
Score 40 requires visible code. A TODO comment or empty file counts as 0. There must be actual Agno class instantiation, even if minimal (e.g., Agent() with no instructions).
Score 70 requires working functionality. The Agno feature is used correctly for the basic case. An agent with name, model, and instructions but no tools scores 70 on M1.1.
Score 85 requires production patterns. Error handling, configuration externalized, appropriate model selection, tested behavior.
Score 100 is rare and earned. Must demonstrate advanced Agno patterns, comprehensive use of framework features, monitoring, documentation. Almost never given.
Agno-native solutions FIRST. Before recommending any external library, check if Agno has a built-in solution. Agno has 100+ built-in tools, 15+ vector DBs, 40+ model providers, guardrails, evals, and more. Always reference the built-in option.
Never inflate scores for potential. Score what EXISTS, not what is planned or easy to add. The improvement plan handles what to add next.
The improvement plan is mandatory. Even if the project scores well, there are always Agno features that could be leveraged further. Always produce the improvement plan.
Parallel agents must not overlap. Each agent scans only their assigned dimensions. No duplicate scanning.
Check AGENTS.md anti-patterns. Specifically flag: agents created in loops, not reusing agent instances, using SQLite in production, making raw LLM API calls instead of using Agno's model abstraction.
Effort estimates are for reaching score 70. The effort to go from current state to basic working implementation, not to world-class.
Respect the project's deployment context. If it is a simple script/demo, do not penalize for missing production deployment (D9). Adjust expectations based on project maturity signals.
Cross-reference with Agno cookbook. When suggesting improvements, reference relevant Agno cookbook examples or documentation patterns where applicable.
Use LSP when available. If LSP tools are active (pyright for Python), leverage them for deeper analysis beyond grep:
After generating the scorecard and saving the report to docs/audits/:
docs/rx-plans/{this-skill-name}/{date}-report.mdrx-plan skill to create or update the improvement plan at docs/rx-plans/{this-skill-name}/{dimension}/v{N}-{date}-plan.mddocs/rx-plans/{this-skill-name}/summary.md with current scoresdocs/rx-plans/dashboard.md with overall progressThis happens automatically — the user does not need to run /rx-plan separately.
development
Prescriptive UX/UI evaluation producing scored opportunity maps for Next.js + shadcn/ui projects. Evaluates user experience against Nielsen Heuristics, WCAG 2.2, Core Web Vitals, Laws of UX, and Atomic Design. Use when: auditing UX quality, evaluating accessibility, reviewing component usage, identifying missing shadcn components, improving form UX, or when the user says "ux audit", "run ux-rx", "evaluate UX", "accessibility check", "improve user experience", "shadcn review", "how to reach A+ UX", or "UX opportunities". Measures 11 dimensions (44 sub-metrics). Fixed stack: Next.js App Router + shadcn/ui + Tailwind CSS. Leverages shadcn registry to recommend ready-to-use components. Outputs per-page scorecards with before/after Mermaid diagrams.
development
Evaluates testing strategy and completeness across 8 dimensions (32 sub-metrics): test pyramid balance, test effectiveness, contract/API testing, UI/visual testing, performance/load testing, test data management, CI integration, and test organization. Produces a scored diagnostic with actionable improvement plans.
development
Code-level security posture evaluation. Scans for OWASP Top 10 vulnerabilities, authentication flaws, injection vectors, authorization gaps, and data protection issues. Complements arch-rx D9 (architectural security) by inspecting actual source code patterns, dependencies, and security configurations. Produces a scored report across 8 dimensions with 32 sub-metrics mapped to OWASP ASVS and CWE references.
testing
Generates versioned improvement plans from rx report results. Creates one plan per dimension that scores below A+ (97). Plans are saved to docs/rx-plans/{domain}/{dimension}/v{N}-{date}-plan.md. Use after running any rx skill, or when the user says "create plan from report", "rx plan", "plan improvements", "generate improvement plan", "what should I fix first", "create roadmap", "improvement plan", "plan from audit", or "next steps from rx".