Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

miaodi/reinforcement-learning

Name: reinforcement-learning
Author: miaodi

skills/reinforcement-learning/SKILL.md

npx skillsauth add miaodi/llm_config reinforcement-learning

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Reinforcement Learning Skill

Purpose

Design, implement, evaluate, and analyze reinforcement learning experiments with clear baselines, sound metrics, and algorithm-appropriate hyperparameters.

When To Use

Use for RL assignments, environment analysis, algorithm selection, training design, evaluation planning, hyperparameter tuning, and result interpretation.

Priorities

Match the algorithm to the environment and project goal.
Establish baselines before tuning advanced methods.
Evaluate stability, sample efficiency, and variance across seeds.
Analyze why a method works or fails, not just whether reward increased.

Workflow

Characterize the task: environment type, action space, observability, reward structure, horizon, and stochasticity.
Choose reasonable baselines and one or two advanced algorithms that fit the setting.
Define training budget, evaluation protocol, and success metrics before running experiments.
Select algorithm-specific hyperparameters to tune rather than using a generic sweep blindly.
Run repeated seeds and compare convergence speed, stability, and final performance.
Suggest plots that reveal behavior, not just final scores.
Analyze output in terms of variance, convergence, sample efficiency, brittleness, and failure modes.

Suggested Plots

reward versus environment steps
reward versus wall-clock time
mean and variance across seeds
ablation plots for key design choices
sensitivity plots for critical hyperparameters
success rate or episode length where relevant
policy/value diagnostics when they explain behavior

Hyperparameter Guidance

Tune algorithm-specific parameters first: learning rate, target-update cadence, entropy coefficient, clipping parameter, discount factor, lambda, rollout length, batch size, replay size, exploration schedule, and network size.
Prefer small targeted sweeps guided by algorithm behavior.
Track both best setting and sensitivity, not only the maximum score.

Constraints

Do not recommend advanced RL algorithms without explaining why they fit the environment.
Do not trust a single seed.
Do not present unstable reward curves without discussing variance and failure modes.
Distinguish training reward from evaluation performance.

Output

Provide:

algorithm recommendation
baseline plan
hyperparameters to try
plots to generate
output analysis and interpretation guidance

miaodi/reinforcement-learning

skills/reinforcement-learning/SKILL.md

Use when designing, implementing, or analyzing reinforcement learning experiments, algorithm selection, environment analysis, hyperparameter tuning, reward curves, and RL evaluation.

development

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add miaodi/llm_config reinforcement-learning

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 12:57 PM21.1s1 file scanned

SKILL.md

name:: reinforcement-learning
description:: Use when designing, implementing, or analyzing reinforcement learning experiments, algorithm selection, environment analysis, hyperparameter tuning, reward curves, and RL evaluation.

Reinforcement Learning Skill

Purpose

Design, implement, evaluate, and analyze reinforcement learning experiments with clear baselines, sound metrics, and algorithm-appropriate hyperparameters.

When To Use

Use for RL assignments, environment analysis, algorithm selection, training design, evaluation planning, hyperparameter tuning, and result interpretation.

Priorities

Match the algorithm to the environment and project goal.
Establish baselines before tuning advanced methods.
Evaluate stability, sample efficiency, and variance across seeds.
Analyze why a method works or fails, not just whether reward increased.

Workflow

Characterize the task: environment type, action space, observability, reward structure, horizon, and stochasticity.
Choose reasonable baselines and one or two advanced algorithms that fit the setting.
Define training budget, evaluation protocol, and success metrics before running experiments.
Select algorithm-specific hyperparameters to tune rather than using a generic sweep blindly.
Run repeated seeds and compare convergence speed, stability, and final performance.
Suggest plots that reveal behavior, not just final scores.
Analyze output in terms of variance, convergence, sample efficiency, brittleness, and failure modes.

Suggested Plots

reward versus environment steps
reward versus wall-clock time
mean and variance across seeds
ablation plots for key design choices
sensitivity plots for critical hyperparameters
success rate or episode length where relevant
policy/value diagnostics when they explain behavior

Hyperparameter Guidance

Tune algorithm-specific parameters first: learning rate, target-update cadence, entropy coefficient, clipping parameter, discount factor, lambda, rollout length, batch size, replay size, exploration schedule, and network size.
Prefer small targeted sweeps guided by algorithm behavior.
Track both best setting and sensitivity, not only the maximum score.

Constraints

Do not recommend advanced RL algorithms without explaining why they fit the environment.
Do not trust a single seed.
Do not present unstable reward curves without discussing variance and failure modes.
Distinguish training reward from evaluation performance.

Output

Provide:

algorithm recommendation
baseline plan
hyperparameters to try
plots to generate
output analysis and interpretation guidance

Related Skills

miaodi/computational-learning-notes

development

VerifiedTrustedCommunity

Use when creating C++ learning notes or minimal experiments for low-level computational, numerical, CPU/GPU, compiler, and hardware concepts such as false sharing, floating point, registers, caches, SIMD, atomics, numerical stability, and benchmarking pitfalls.

SKILL.mdUpdated Jun 2, 2026

miaodi/computational-learning-notes

miaodi/latex-project-build

development

VerifiedTrustedCommunity

Use when configuring, diagnosing, or compiling LaTeX projects, especially multi-file reports, theses, books, chapter-based projects, Overleaf exports, latexmk/arara/Makefile workflows, bibliography/index/glossary passes, or projects that require pdflatex, xelatex, lualatex, latex->dvips, biber, or bibtex.

SKILL.mdUpdated May 28, 2026

miaodi/latex-project-build

miaodi/graph-algorithms

development

VerifiedTrustedCommunity

Use when working with graph traversals (BFS, DFS, level-order), minimum spanning trees, strongly connected components, topological sort, graph coloring, bipartite detection, elimination trees, level-set extraction, parallel graph algorithms, task-tree parallelism, sparse graph representations, and exploiting graph structure for parallel sparse computations.

SKILL.mdUpdated May 21, 2026

miaodi/graph-algorithms

miaodi/git-workflow

testing

VerifiedTrustedCommunity

Use when planning or executing Git branch workflows, especially merge/rebase across branches, conflict resolution, safe history rewriting, and recovery from mistakes.

SKILL.mdUpdated May 21, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/miaodi/llm_config.git

# Copy into Claude Code skills folder (global)
cp -r llm_config/skills/reinforcement-learning ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

miaodi/llm_config

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT