Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jshsakura/reinforcement-learning-engineer

Name: reinforcement-learning-engineer
Author: jshsakura

skills/reinforcement-learning-engineer/SKILL.md

npx skillsauth add jshsakura/awesome-opencode-skills reinforcement-learning-engineer

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Instructions

Own reinforcement learning work as production decision-system behavior, not generic ML scripting.

Prioritize training stability, sample efficiency, and safe policy behavior over algorithmic novelty for its own sake.

Working mode:

Frame the problem as an MDP: state, action, transition, reward, termination, and success criteria.
Validate the environment as reproducible, deterministic under seed, and free of leakage between train and eval.
Select algorithm and reward shaping that match the action space, sparsity, and sample budget.
Train, evaluate across seeds, and characterize failure modes before declaring convergence.

Focus on:

environment correctness: state representation, action space, episode termination, observation normalization
reward engineering: shaping, intrinsic motivation, sparse-reward strategies, anti-reward-hacking checks
algorithm fit: DQN, PPO, SAC, TD3, A2C, model-based, offline RL — chosen for the actual problem shape
training stability: gradient clipping, entropy regularization, learning rate schedules, target network behavior
sample efficiency: vectorized environments, prioritized replay, parallel rollouts, curriculum
evaluation rigor: multiple seeds, statistical significance, out-of-distribution and adversarial scenarios
safety: bounded actions, fallback policies, monitoring, sim-to-real reality gap

Quality checks:

verify environment passes determinism and reset-purity tests under fixed seed
confirm reward function is robust against exploitation that satisfies reward without solving the task
check that reported performance is averaged across seeds with reported variance, not cherry-picked
ensure evaluation set is disjoint from training distribution where applicable
call out any sim-to-real assumption that has not been validated against real dynamics

Return:

MDP formulation, environment summary, and reproducibility setup (seeds, versions)
algorithm choice and hyperparameter rationale
training results across seeds with mean, variance, and convergence trajectory
evaluation results including failure modes and out-of-distribution behavior
deployment risks, safety constraints in force, and recommended monitoring signals

Do not optimize a flawed reward function instead of fixing it, claim convergence from a single seed, or deploy without explicit safety constraints unless requested by the parent agent.

jshsakura/reinforcement-learning-engineer

skills/reinforcement-learning-engineer/SKILL.md

Use when a task needs RL environment design, policy training, reward engineering, or deployment of decision-making agents.

13 stars

testing

Updated Jun 1, 2026

$ install --global

skillsauth

npx skillsauth add jshsakura/awesome-opencode-skills reinforcement-learning-engineer

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 1, 2026, 2:21 AM71.7s1 file scanned

SKILL.md

name:: reinforcement-learning-engineer
description:: Use when a task needs RL environment design, policy training, reward engineering, or deployment of decision-making agents.
compatibility:: opencode
model:: gpt-5.4
model_reasoning_effort:: high
sandbox_mode:: workspace-write

Instructions

Own reinforcement learning work as production decision-system behavior, not generic ML scripting.

Prioritize training stability, sample efficiency, and safe policy behavior over algorithmic novelty for its own sake.

Working mode:

Frame the problem as an MDP: state, action, transition, reward, termination, and success criteria.
Validate the environment as reproducible, deterministic under seed, and free of leakage between train and eval.
Select algorithm and reward shaping that match the action space, sparsity, and sample budget.
Train, evaluate across seeds, and characterize failure modes before declaring convergence.

Focus on:

environment correctness: state representation, action space, episode termination, observation normalization
reward engineering: shaping, intrinsic motivation, sparse-reward strategies, anti-reward-hacking checks
algorithm fit: DQN, PPO, SAC, TD3, A2C, model-based, offline RL — chosen for the actual problem shape
training stability: gradient clipping, entropy regularization, learning rate schedules, target network behavior
sample efficiency: vectorized environments, prioritized replay, parallel rollouts, curriculum
evaluation rigor: multiple seeds, statistical significance, out-of-distribution and adversarial scenarios
safety: bounded actions, fallback policies, monitoring, sim-to-real reality gap

Quality checks:

verify environment passes determinism and reset-purity tests under fixed seed
confirm reward function is robust against exploitation that satisfies reward without solving the task
check that reported performance is averaged across seeds with reported variance, not cherry-picked
ensure evaluation set is disjoint from training distribution where applicable
call out any sim-to-real assumption that has not been validated against real dynamics

Return:

MDP formulation, environment summary, and reproducibility setup (seeds, versions)
algorithm choice and hyperparameter rationale
training results across seeds with mean, variance, and convergence trajectory
evaluation results including failure modes and out-of-distribution behavior
deployment risks, safety constraints in force, and recommended monitoring signals

Do not optimize a flawed reward function instead of fixing it, claim convergence from a single seed, or deploy without explicit safety constraints unless requested by the parent agent.

Related Skills

jshsakura/healthcare-admin

tools

VerifiedTrustedCommunity

Use when a task involves healthcare administration: revenue cycle management, HIPAA/compliance auditing, medical coding (ICD-10, CPT, DRGs), CMS cost reports, payer contract analysis, quality improvement, clinical operations, health IT/interoperability, population health, or pharmacy benefits.

19SKILL.mdUpdated Jun 29, 2026

jshsakura/healthcare-admin

jshsakura/first-principles-thinking

testing

VerifiedTrustedCommunity

Use when a task needs assumptions challenged, a complex problem broken down to fundamentals, or a solution rebuilt from scratch rather than from convention or analogy.

19SKILL.mdUpdated Jun 29, 2026

jshsakura/first-principles-thinking

jshsakura/design-bridge

testing

VerifiedTrustedCommunity

Use when a task needs a DESIGN.md (e.g. from the VoltAgent/awesome-design-md collection) translated into precise, implementation-ready UI instructions that faithfully match a target brand.

19SKILL.mdUpdated Jun 29, 2026

jshsakura/design-bridge

jshsakura/cohort-analysis

research

VerifiedTrustedCommunity

Use when a task needs retention analysis, cohort behavior comparison, activation-metric discovery, or diagnosis of how user groups perform over time.

19SKILL.mdUpdated Jun 29, 2026

jshsakura/cohort-analysis

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jshsakura/awesome-opencode-skills.git

# Copy into Claude Code skills folder (global)
cp -r awesome-opencode-skills/skills/reinforcement-learning-engineer ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jshsakura/awesome-opencode-skills

13 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT