skills/research/autonomous-llm-research/SKILL.md
Run durable end-to-end LLM post-training research loops from zero-spec ideas to finished reports. Use when the goal is autonomous literature review, hypothesis selection, experiment planning, Tinker training, evaluation, checkpointing, and resumable long-running research work.
npx skillsauth add aum08desai/hermes-research-agent autonomous-llm-researchInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill as the default operating procedure for Hermes Research Agent.
zero_spec: the user gives a broad or vague goal. Start with literature and idea generation.partial_spec: the user gives some constraints. Fill the missing dataset, eval, and training details.full_spec: the user gives an executable recipe. Execute it unless there is a contradiction or an approval gate.research_loop or research_state..hermes-research/.zero_spec, the sequence is mandatory:research_manager(action="triage_literature", ...) to rank papers and identify gap candidates.research_manager(action="assess_dataset", ...) before trusting newly generated or curated datasets.research_manager(action="rank_runs", ...) after evaluations to detect regressions versus baseline.research_manager(action="plan_next_step", ...) before deciding the next experiment.research_manager(action="write_research_memo", ...) whenever a loop chunk meaningfully changes the project state.research_loop(action="checkpoint_loop", ...) before context or time limits are likely.research_loop(action="schedule_continuation", ...) or rely on the checkpoint helper to resume in a fresh session.research_loop(action="monitor_run", ...) so the project can return after hours of work.research-idea-generation when the project starts from little or no specification.literature-to-experiment when turning papers or blog posts into concrete hypotheses and experiment plans.tinker when preparing or launching post-training runs.eval-and-ablation after any training run completes or when planning comparisons.research-reporting when writing iteration reports, summaries, or final deliverables.Every completed loop or chunk should leave behind:
.hermes-research/development
Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.
development
Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.
development
Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first.
development
Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).