Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

aum08desai/autonomous-llm-research

Name: autonomous-llm-research
Author: aum08desai

skills/research/autonomous-llm-research/SKILL.md

Run durable end-to-end LLM post-training research loops from zero-spec ideas to finished reports. Use when the goal is autonomous literature review, hypothesis selection, experiment planning, Tinker training, evaluation, checkpointing, and resumable long-running research work.

6 stars

development

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add aum08desai/hermes-research-agent autonomous-llm-research

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 3, 2026, 11:18 AM201.0s2 files scanned

SKILL.md

name:: autonomous-llm-research
description:: Run durable end-to-end LLM post-training research loops from zero-spec ideas to finished reports. Use when the goal is autonomous literature review, hypothesis selection, experiment planning, Tinker training, evaluation, checkpointing, and resumable long-running research work.
version:: 1.0.0
author:: Hermes Research Agent
license:: MIT
tags:: [research, llm, post-training, autonomy, tinker, evaluation]

Autonomous LLM Research

Use this skill as the default operating procedure for Hermes Research Agent.

Core rules

Detect the request type first:

zero_spec: the user gives a broad or vague goal. Start with literature and idea generation.
partial_spec: the user gives some constraints. Fill the missing dataset, eval, and training details.
full_spec: the user gives an executable recipe. Execute it unless there is a contradiction or an approval gate.

Be state-first:

Before substantive work, create or resume a project with research_loop or research_state.
Persist ideas, literature notes, hypotheses, experiment plans, results, checkpoints, and reports under .hermes-research/.
Treat chat history as ephemeral. Treat on-disk state as canonical.

For zero_spec, the sequence is mandatory:

Search literature and identify concrete gaps.
Generate 3 candidate ideas.
Score them on novelty, feasibility, cost, and evaluation clarity.
Choose one and record why.
Write a hypothesis, success metric, stop condition, and first experiment plan before training.

Never launch training without:

a written hypothesis
a written success metric
a written stop condition
a persisted experiment plan

Use the management layer after each major step:

Use research_manager(action="triage_literature", ...) to rank papers and identify gap candidates.
Use research_manager(action="assess_dataset", ...) before trusting newly generated or curated datasets.
Use research_manager(action="rank_runs", ...) after evaluations to detect regressions versus baseline.
Use research_manager(action="plan_next_step", ...) before deciding the next experiment.
Use research_manager(action="write_research_memo", ...) whenever a loop chunk meaningfully changes the project state.

Long-running work must be resumable:

Use research_loop(action="checkpoint_loop", ...) before context or time limits are likely.
Use research_loop(action="schedule_continuation", ...) or rely on the checkpoint helper to resume in a fresh session.
When Tinker runs are active, use research_loop(action="monitor_run", ...) so the project can return after hours of work.

When to load other skills

Load research-idea-generation when the project starts from little or no specification.
Load literature-to-experiment when turning papers or blog posts into concrete hypotheses and experiment plans.
Load tinker when preparing or launching post-training runs.
Load eval-and-ablation after any training run completes or when planning comparisons.
Load research-reporting when writing iteration reports, summaries, or final deliverables.

Default output expectations

Every completed loop or chunk should leave behind:

updated project state in .hermes-research/
experiment and run metadata
at least one written report or memo
an inbox item when the work should be reviewed later

Related Skills

aum08desai/writing-plans

development

VerifiedTrustedCommunity

Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.

6SKILL.mdUpdated Apr 3, 2026

aum08desai/writing-plans

aum08desai/test-driven-development

development

VerifiedTrustedCommunity

Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.

6SKILL.mdUpdated Apr 3, 2026

aum08desai/test-driven-development

aum08desai/systematic-debugging

development

VerifiedTrustedCommunity

Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first.

6SKILL.mdUpdated Apr 3, 2026

aum08desai/systematic-debugging

aum08desai/subagent-driven-development

development

VerifiedTrustedCommunity

Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).

6SKILL.mdUpdated Apr 3, 2026

aum08desai/subagent-driven-development

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/aum08desai/hermes-research-agent.git

# Copy into Claude Code skills folder (global)
cp -r hermes-research-agent/skills/research/autonomous-llm-research ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

aum08desai/hermes-research-agent

6 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

aum08desai/autonomous-llm-research

skills/research/autonomous-llm-research/SKILL.md

6 stars

development

Updated Apr 3, 2026

$ install --global

skillsauth

npx skillsauth add aum08desai/hermes-research-agent autonomous-llm-research

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 3, 2026, 11:18 AM201.0s2 files scanned

SKILL.md

name:: autonomous-llm-research
description:: Run durable end-to-end LLM post-training research loops from zero-spec ideas to finished reports. Use when the goal is autonomous literature review, hypothesis selection, experiment planning, Tinker training, evaluation, checkpointing, and resumable long-running research work.
version:: 1.0.0
author:: Hermes Research Agent
license:: MIT
tags:: [research, llm, post-training, autonomy, tinker, evaluation]

Autonomous LLM Research

Use this skill as the default operating procedure for Hermes Research Agent.

Core rules

Detect the request type first:

zero_spec: the user gives a broad or vague goal. Start with literature and idea generation.
partial_spec: the user gives some constraints. Fill the missing dataset, eval, and training details.
full_spec: the user gives an executable recipe. Execute it unless there is a contradiction or an approval gate.

Be state-first:

Before substantive work, create or resume a project with research_loop or research_state.
Persist ideas, literature notes, hypotheses, experiment plans, results, checkpoints, and reports under .hermes-research/.
Treat chat history as ephemeral. Treat on-disk state as canonical.

For zero_spec, the sequence is mandatory:

Search literature and identify concrete gaps.
Generate 3 candidate ideas.
Score them on novelty, feasibility, cost, and evaluation clarity.
Choose one and record why.
Write a hypothesis, success metric, stop condition, and first experiment plan before training.

Never launch training without:

a written hypothesis
a written success metric
a written stop condition
a persisted experiment plan

Use the management layer after each major step:

Use research_manager(action="triage_literature", ...) to rank papers and identify gap candidates.
Use research_manager(action="assess_dataset", ...) before trusting newly generated or curated datasets.
Use research_manager(action="rank_runs", ...) after evaluations to detect regressions versus baseline.
Use research_manager(action="plan_next_step", ...) before deciding the next experiment.
Use research_manager(action="write_research_memo", ...) whenever a loop chunk meaningfully changes the project state.

Long-running work must be resumable:

Use research_loop(action="checkpoint_loop", ...) before context or time limits are likely.
Use research_loop(action="schedule_continuation", ...) or rely on the checkpoint helper to resume in a fresh session.
When Tinker runs are active, use research_loop(action="monitor_run", ...) so the project can return after hours of work.

When to load other skills

Load research-idea-generation when the project starts from little or no specification.
Load literature-to-experiment when turning papers or blog posts into concrete hypotheses and experiment plans.
Load tinker when preparing or launching post-training runs.
Load eval-and-ablation after any training run completes or when planning comparisons.
Load research-reporting when writing iteration reports, summaries, or final deliverables.

Default output expectations

Every completed loop or chunk should leave behind:

updated project state in .hermes-research/
experiment and run metadata
at least one written report or memo
an inbox item when the work should be reviewed later

Related Skills

aum08desai/writing-plans

development

VerifiedTrustedCommunity

Use when you have a spec or requirements for a multi-step task. Creates comprehensive implementation plans with bite-sized tasks, exact file paths, and complete code examples.

6SKILL.mdUpdated Apr 3, 2026

aum08desai/writing-plans

aum08desai/test-driven-development

development

VerifiedTrustedCommunity

Use when implementing any feature or bugfix, before writing implementation code. Enforces RED-GREEN-REFACTOR cycle with test-first approach.

6SKILL.mdUpdated Apr 3, 2026

aum08desai/test-driven-development

aum08desai/systematic-debugging

development

VerifiedTrustedCommunity

Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first.

6SKILL.mdUpdated Apr 3, 2026

aum08desai/systematic-debugging

aum08desai/subagent-driven-development

development

VerifiedTrustedCommunity

Use when executing implementation plans with independent tasks. Dispatches fresh delegate_task per task with two-stage review (spec compliance then code quality).

6SKILL.mdUpdated Apr 3, 2026

aum08desai/subagent-driven-development

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/aum08desai/hermes-research-agent.git

# Copy into Claude Code skills folder (global)
cp -r hermes-research-agent/skills/research/autonomous-llm-research ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

aum08desai/hermes-research-agent

6 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT