skills/evaluate-agent-framework/SKILL.md
Assess an open-source agent framework for investment readiness by evaluating community health, supersession risk, architecture alignment, and governance sustainability. Produces a four-tier classification (INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID) to guide resource allocation decisions before committing engineering effort.
npx skillsauth add pjt222/agent-almanac evaluate-agent-frameworkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured assessment of an open-source agent framework's investment readiness. The novel value is in Steps 2-3: quantifying community health through contribution survival rates and measuring supersession risk — the most common reason external engineering effort is wasted. The final classification (INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID) calibrates resource allocation before committing development cycles.
framework_url — GitHub URL of the framework repositorycomparison_frameworks — list of alternative framework URLs to benchmark againstuse_case — intended use case for architecture alignment assessment (e.g., "multi-agent orchestration", "tool-use pipelines")contribution_budget — planned engineering hours, for calibrating the investment tierCollect foundational data about the project's size, activity, and landscape position before deeper analysis.
README.md, CONTRIBUTING.md, LICENSE, and any architecture docs (docs/, ARCHITECTURE.md)gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequestsgh api repos/<owner>/<repo>/dependentsgh release list --limit 10 — note frequency and whether releases follow semvercomparison_frameworks is provided, gather the same metrics for each alternativeExpected: Census table with stars, forks, dependents, release cadence, bus factor, and landscape position for the target (and comparisons if provided).
On failure: If the repository is private or API-rate-limited, fall back to manual README analysis. If metrics are unavailable (e.g., self-hosted GitLab), note the gap and proceed with qualitative assessment.
Quantify whether the project welcomes, supports, and retains external contributors.
gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labelssurvival_rate = merged_external_PRs / total_external_PRsCONTRIBUTING.md exists and is actionable (not just "submit a PR")CODE_OF_CONDUCT.md existsExpected: Community health scorecard with survival rate, response times, diversity ratio, and governance artifact checklist.
On failure: If PR data is insufficient (new project with <20 closed PRs), note the sample size limitation and weight other signals more heavily. If the project uses a non-GitHub platform, adapt the queries to that platform's API.
Determine how likely it is that external contributions will be rendered obsolete by internal development — the single biggest risk for framework adopters and contributors.
supersession_rate = (reverted + rewritten + obsoleted) / total_merged_externalExpected: Supersession rate as a percentage, with breakdown by type (reverted/rewritten/obsoleted). Roadmap overlap assessment.
On failure: If commit history is shallow or squash-merged (losing attribution), estimate supersession by comparing external PR file paths against files changed in subsequent releases. Note reduced confidence in the estimate.
Assess whether the framework's architecture supports your use case without excessive lock-in.
use_case is provided, evaluate whether the framework's architecture naturally supports itExpected: Architecture alignment report with extension point inventory, lock-in risk assessment (low/medium/high), API stability score, and use-case fit evaluation.
On failure: If architecture documentation is sparse, derive the assessment from code structure and public API surface. If the framework is too young for stability history, note this and weight governance signals more heavily.
Evaluate whether the project's governance model supports long-term viability and fair treatment of external contributors.
SECURITY.md or equivalent)Expected: Governance assessment with model classification, sustainability rating (sustainable/at-risk/critical), contributor protection evaluation, and security posture summary.
On failure: If governance information is undocumented, treat the absence itself as a yellow flag. Check for implicit governance by examining who merges PRs, who closes issues, and who makes release decisions.
Synthesize all findings into a four-tier classification with specific justifications and actionable recommendations.
contribution_budget was provided, recommend how to allocate those hours given the tiercomparison_frameworks were evaluated, produce a comparison matrix ranking all frameworksExpected: Classification report with tier, dimension scores, evidence summary, and actionable recommendations tailored to the investment context.
On failure: If data gaps prevent confident classification, default to EVALUATE-FURTHER with explicit documentation of what data is missing and how to obtain it. Never default to INVEST when uncertain.
testing
Launch all available agents in parallel waves for open-ended hypothesis generation on problems where the correct domain is unknown. Use when facing a cross-domain problem with no clear starting point, when single-agent approaches have stalled, or when diverse perspectives are more valuable than deep expertise. Produces a ranked hypothesis set with convergence analysis and adversarial refinement.
tools
Write integration tests for a Node.js CLI application using the built-in node:test module. Covers the exec helper pattern, output assertions, filesystem state verification, cleanup hooks, JSON output parsing, error case testing, and state restoration after destructive tests. Use when adding tests to an existing CLI, testing a new command, verifying adapter behavior across frameworks, or setting up CI for a CLI tool.
development
Screen a proposed trademark for conflicts and distinctiveness before filing. Covers trademark database searches (TMview, WIPO Global Brand Database, USPTO TESS), distinctiveness analysis using the Abercrombie spectrum, likelihood of confusion assessment using DuPont factors and EUIPO relative grounds, common law rights evaluation, and goods/services overlap analysis. Produces a conflict report with a risk matrix. Use before adopting a new brand name, logo, or slogan — distinct from patent prior art search, which uses different databases, legal frameworks, and analysis methods.
tools
Scaffold a new CLI command using Commander.js with options, action handler, three output modes (human-readable, quiet, JSON), and optional ceremony variant. Covers command naming, option design, shared context patterns, error handling, and integration testing. Use when adding a command to an existing Commander.js CLI, designing a new CLI tool from scratch, or standardizing command structure across a multi-command CLI.