skills/dnyoussef/agentdb-reinforcement-learning-training/SKILL.md
Train AI agents using AgentDB's 9 reinforcement learning algorithms including Q-Learning, DQN, PPO, and Actor-Critic. Build self-learning agents, implement RL training loops with experience replay, and deploy optimized models to production.
npx skillsauth add aiskillstore/marketplace agentdb-reinforcement-learning-trainingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Train AI learning plugins with AgentDB's 9 reinforcement learning algorithms including Decision Transformer, Q-Learning, SARSA, Actor-Critic, PPO, and more. Build self-learning agents, implement RL, and optimize agent behavior through experience.
Use this skill when you need to:
Objective: Setup AgentDB learning infrastructure with environment configuration
Agent: ml-developer
Steps:
npm install agentdb-learning@latest
npm install @agentdb/rl-algorithms @agentdb/environments
import { AgentDB, LearningPlugin } from 'agentdb-learning';
const learningDB = new AgentDB({
name: 'rl-training-db',
dimensions: 512, // State embedding dimension
learning: {
enabled: true,
persistExperience: true,
replayBufferSize: 100000
}
});
await learningDB.initialize();
// Create learning plugin
const learningPlugin = new LearningPlugin({
database: learningDB,
algorithms: ['q-learning', 'dqn', 'ppo', 'actor-critic'],
config: {
batchSize: 64,
learningRate: 0.001,
discountFactor: 0.99,
explorationRate: 1.0,
explorationDecay: 0.995
}
});
await learningPlugin.initialize();
import { Environment } from '@agentdb/environments';
const environment = new Environment({
name: 'grid-world',
stateSpace: {
type: 'continuous',
shape: [10, 10],
bounds: [[0, 10], [0, 10]]
},
actionSpace: {
type: 'discrete',
actions: ['up', 'down', 'left', 'right']
},
rewardFunction: (state, action, nextState) => {
// Distance to goal reward
const goalDistance = Math.sqrt(
Math.pow(nextState[0] - 9, 2) +
Math.pow(nextState[1] - 9, 2)
);
return -goalDistance + (goalDistance === 0 ? 100 : 0);
},
terminalCondition: (state) => {
return state[0] === 9 && state[1] === 9; // Reached goal
}
});
await environment.initialize();
const monitor = learningPlugin.createMonitor({
metrics: ['reward', 'loss', 'exploration-rate', 'episode-length'],
logInterval: 100, // Log every 100 episodes
saveCheckpoints: true,
checkpointInterval: 1000
});
monitor.on('episode-complete', (episode) => {
console.log('Episode:', episode.number, 'Reward:', episode.totalReward);
});
Memory Pattern:
await agentDB.memory.store('agentdb/learning/environment', {
name: environment.name,
stateSpace: environment.stateSpace,
actionSpace: environment.actionSpace,
initialized: Date.now()
});
Validation:
Objective: Select and configure RL algorithm for the learning task
Agent: ml-developer
Steps:
// Example: Deep Q-Network (DQN)
const dqnAgent = learningPlugin.createAgent({
algorithm: 'dqn',
config: {
networkArchitecture: {
layers: [
{ type: 'dense', units: 128, activation: 'relu' },
{ type: 'dense', units: 128, activation: 'relu' },
{ type: 'dense', units: environment.actionSpace.size, activation: 'linear' }
]
},
learningRate: 0.001,
batchSize: 64,
replayBuffer: {
size: 100000,
prioritized: true,
alpha: 0.6,
beta: 0.4
},
targetNetwork: {
updateFrequency: 1000,
tauSync: 0.001 // Soft update
},
exploration: {
initial: 1.0,
final: 0.01,
decay: 0.995
},
training: {
startAfter: 1000, // Start training after 1000 experiences
updateFrequency: 4
}
}
});
await dqnAgent.initialize();
const hyperparameters = {
// Learning parameters
learningRate: 0.001,
discountFactor: 0.99, // Gamma
batchSize: 64,
// Exploration
epsilonStart: 1.0,
epsilonEnd: 0.01,
epsilonDecay: 0.995,
// Experience replay
replayBufferSize: 100000,
minReplaySize: 1000,
prioritizedReplay: true,
// Training
maxEpisodes: 10000,
maxStepsPerEpisode: 1000,
targetUpdateFrequency: 1000,
// Evaluation
evalFrequency: 100,
evalEpisodes: 10
};
dqnAgent.setHyperparameters(hyperparameters);
import { PrioritizedReplayBuffer } from '@agentdb/rl-algorithms';
const replayBuffer = new PrioritizedReplayBuffer({
capacity: 100000,
alpha: 0.6, // Prioritization exponent
beta: 0.4, // Importance sampling
betaIncrement: 0.001,
epsilon: 0.01 // Small constant for stability
});
dqnAgent.setReplayBuffer(replayBuffer);
const trainingConfig = {
episodes: 10000,
stepsPerEpisode: 1000,
warmupSteps: 1000,
trainFrequency: 4,
targetUpdateFrequency: 1000,
saveFrequency: 1000,
evalFrequency: 100,
earlyStoppingPatience: 500,
earlyStoppingThreshold: 0.01
};
dqnAgent.setTrainingConfig(trainingConfig);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/algorithm-config', {
algorithm: 'dqn',
hyperparameters: hyperparameters,
trainingConfig: trainingConfig,
configured: Date.now()
});
Validation:
Objective: Execute training iterations and optimize agent behavior
Agent: safla-neural
Steps:
async function trainAgent() {
console.log('Starting RL training...');
const trainingStats = {
episodes: [],
totalReward: [],
episodeLength: [],
loss: [],
explorationRate: []
};
for (let episode = 0; episode < trainingConfig.episodes; episode++) {
let state = await environment.reset();
let episodeReward = 0;
let episodeLength = 0;
let episodeLoss = 0;
for (let step = 0; step < trainingConfig.stepsPerEpisode; step++) {
// Select action
const action = await dqnAgent.selectAction(state, {
explore: true
});
// Execute action
const { nextState, reward, done } = await environment.step(action);
// Store experience
await dqnAgent.storeExperience({
state,
action,
reward,
nextState,
done
});
// Train if enough experiences
if (dqnAgent.canTrain()) {
const loss = await dqnAgent.train();
episodeLoss += loss;
}
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) break;
}
// Update target network
if (episode % trainingConfig.targetUpdateFrequency === 0) {
await dqnAgent.updateTargetNetwork();
}
// Decay exploration
dqnAgent.decayExploration();
// Log progress
trainingStats.episodes.push(episode);
trainingStats.totalReward.push(episodeReward);
trainingStats.episodeLength.push(episodeLength);
trainingStats.loss.push(episodeLoss / episodeLength);
trainingStats.explorationRate.push(dqnAgent.getExplorationRate());
if (episode % 100 === 0) {
console.log(`Episode ${episode}:`, {
reward: episodeReward.toFixed(2),
length: episodeLength,
loss: (episodeLoss / episodeLength).toFixed(4),
epsilon: dqnAgent.getExplorationRate().toFixed(3)
});
}
// Save checkpoint
if (episode % trainingConfig.saveFrequency === 0) {
await dqnAgent.save(`checkpoint-${episode}`);
}
// Evaluate
if (episode % trainingConfig.evalFrequency === 0) {
const evalReward = await evaluateAgent(dqnAgent, environment);
console.log(`Evaluation at episode ${episode}: ${evalReward.toFixed(2)}`);
}
// Early stopping
if (checkEarlyStopping(trainingStats, episode)) {
console.log('Early stopping triggered');
break;
}
}
return trainingStats;
}
const trainingStats = await trainAgent();
monitor.on('training-update', (stats) => {
// Calculate moving averages
const window = 100;
const recentRewards = stats.totalReward.slice(-window);
const avgReward = recentRewards.reduce((a, b) => a + b, 0) / recentRewards.length;
// Store metrics
agentDB.memory.store('agentdb/learning/training-progress', {
episode: stats.episodes[stats.episodes.length - 1],
avgReward: avgReward,
explorationRate: stats.explorationRate[stats.explorationRate.length - 1],
timestamp: Date.now()
});
// Plot learning curve (if visualization enabled)
if (monitor.visualization) {
monitor.plot('reward-curve', stats.episodes, stats.totalReward);
monitor.plot('loss-curve', stats.episodes, stats.loss);
}
});
function checkConvergence(stats, windowSize = 100, threshold = 0.01) {
if (stats.totalReward.length < windowSize * 2) {
return false;
}
const recent = stats.totalReward.slice(-windowSize);
const previous = stats.totalReward.slice(-windowSize * 2, -windowSize);
const recentAvg = recent.reduce((a, b) => a + b, 0) / recent.length;
const previousAvg = previous.reduce((a, b) => a + b, 0) / previous.length;
const improvement = (recentAvg - previousAvg) / Math.abs(previousAvg);
return improvement < threshold;
}
await dqnAgent.save('trained-agent-final', {
includeReplayBuffer: false,
includeOptimizer: false,
metadata: {
trainingStats: trainingStats,
hyperparameters: hyperparameters,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1]
}
});
console.log('Training complete. Model saved.');
Memory Pattern:
await agentDB.memory.store('agentdb/learning/training-results', {
algorithm: 'dqn',
episodes: trainingStats.episodes.length,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1],
converged: checkConvergence(trainingStats),
modelPath: 'trained-agent-final',
timestamp: Date.now()
});
Validation:
Objective: Benchmark trained agent and validate performance
Agent: performance-benchmarker
Steps:
const trainedAgent = await learningPlugin.loadAgent('trained-agent-final');
async function evaluateAgent(agent, env, numEpisodes = 100) {
const results = {
rewards: [],
episodeLengths: [],
successRate: 0
};
for (let i = 0; i < numEpisodes; i++) {
let state = await env.reset();
let episodeReward = 0;
let episodeLength = 0;
let success = false;
for (let step = 0; step < 1000; step++) {
const action = await agent.selectAction(state, { explore: false });
const { nextState, reward, done } = await env.step(action);
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) {
success = env.isSuccessful(state);
break;
}
}
results.rewards.push(episodeReward);
results.episodeLengths.push(episodeLength);
if (success) results.successRate += 1;
}
results.successRate /= numEpisodes;
return {
meanReward: results.rewards.reduce((a, b) => a + b, 0) / results.rewards.length,
stdReward: calculateStd(results.rewards),
meanLength: results.episodeLengths.reduce((a, b) => a + b, 0) / results.episodeLengths.length,
successRate: results.successRate,
results: results
};
}
const evalResults = await evaluateAgent(trainedAgent, environment, 100);
console.log('Evaluation results:', evalResults);
// Random policy baseline
const randomAgent = learningPlugin.createAgent({ algorithm: 'random' });
const randomResults = await evaluateAgent(randomAgent, environment, 100);
// Calculate improvement
const improvement = {
rewardImprovement: (evalResults.meanReward - randomResults.meanReward) / Math.abs(randomResults.meanReward),
lengthImprovement: (randomResults.meanLength - evalResults.meanLength) / randomResults.meanLength,
successImprovement: evalResults.successRate - randomResults.successRate
};
console.log('Improvement over random:', improvement);
const benchmarks = {
performanceMetrics: {
meanReward: evalResults.meanReward,
stdReward: evalResults.stdReward,
successRate: evalResults.successRate,
meanEpisodeLength: evalResults.meanLength
},
algorithmComparison: {
dqn: evalResults,
random: randomResults,
improvement: improvement
},
inferenceTiming: {
actionSelection: 0,
totalEpisode: 0
}
};
// Measure inference speed
const timingTrials = 1000;
const startTime = performance.now();
for (let i = 0; i < timingTrials; i++) {
const state = await environment.randomState();
await trainedAgent.selectAction(state, { explore: false });
}
const endTime = performance.now();
benchmarks.inferenceTiming.actionSelection = (endTime - startTime) / timingTrials;
await agentDB.memory.store('agentdb/learning/benchmarks', benchmarks);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/validation', {
evaluated: true,
meanReward: evalResults.meanReward,
successRate: evalResults.successRate,
improvement: improvement,
timestamp: Date.now()
});
Validation:
Objective: Deploy trained agents to production environment
Agent: ml-developer
Steps:
await trainedAgent.export('production-agent', {
format: 'onnx', // or 'tensorflowjs', 'pytorch'
optimize: true,
quantize: 'int8', // Quantization for faster inference
includeMetadata: true
});
import express from 'express';
const app = express();
app.use(express.json());
// Load production agent
const productionAgent = await learningPlugin.loadAgent('production-agent');
app.post('/api/predict', async (req, res) => {
try {
const { state } = req.body;
const action = await productionAgent.selectAction(state, {
explore: false,
returnProbabilities: true
});
res.json({
action: action.action,
probabilities: action.probabilities,
confidence: action.confidence
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => {
console.log('RL agent API running on port 3000');
});
import { ProductionMonitor } from '@agentdb/monitoring';
const prodMonitor = new ProductionMonitor({
agent: productionAgent,
metrics: ['inference-latency', 'action-distribution', 'reward-feedback'],
alerting: {
latencyThreshold: 100, // ms
anomalyDetection: true
}
});
await prodMonitor.start();
const deploymentPipeline = {
stages: [
{
name: 'validation',
steps: [
'Load trained model',
'Run validation suite',
'Check performance metrics',
'Verify inference speed'
]
},
{
name: 'export',
steps: [
'Export to production format',
'Optimize model',
'Quantize weights',
'Package artifacts'
]
},
{
name: 'deployment',
steps: [
'Deploy to staging',
'Run smoke tests',
'Deploy to production',
'Monitor performance'
]
}
]
};
await agentDB.memory.store('agentdb/learning/deployment-pipeline', deploymentPipeline);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/production', {
deployed: true,
modelPath: 'production-agent',
apiEndpoint: 'http://localhost:3000/api/predict',
monitoring: true,
timestamp: Date.now()
});
Validation:
#!/bin/bash
# train-rl-agent.sh
set -e
echo "AgentDB RL Training Script"
echo "=========================="
# Phase 1: Initialize
echo "Phase 1: Initializing learning environment..."
npm install agentdb-learning @agentdb/rl-algorithms
# Phase 2: Configure
echo "Phase 2: Configuring algorithm..."
node -e "require('./config-algorithm.js')"
# Phase 3: Train
echo "Phase 3: Training agent..."
node -e "require('./train-agent.js')"
# Phase 4: Validate
echo "Phase 4: Validating performance..."
node -e "require('./evaluate-agent.js')"
# Phase 5: Deploy
echo "Phase 5: Deploying to production..."
node -e "require('./deploy-agent.js')"
echo "Training complete!"
// quickstart-rl.ts
import { setupRLTraining } from './setup';
async function quickStart() {
console.log('Starting RL training quick setup...');
// Setup
const { learningDB, environment, agent } = await setupRLTraining({
algorithm: 'dqn',
environment: 'grid-world',
episodes: 1000
});
// Train
console.log('Training agent...');
const stats = await agent.train(environment, {
episodes: 1000,
logInterval: 100
});
// Evaluate
console.log('Evaluating agent...');
const results = await agent.evaluate(environment, {
episodes: 100
});
console.log('Results:', results);
// Save
await agent.save('quickstart-agent');
console.log('Quick start complete!');
}
quickStart().catch(console.error);
Training Convergence (Self-Consistency)
Performance Benchmarks (Quantitative)
Algorithm Validation (Chain-of-Verification)
Production Readiness (Multi-Agent Consensus)
development
Apple Human Interface Guidelines for content display components. Use this skill when the user asks about charts component, collection view, image view, web view, color well, image well, activity view, lockup, data visualization, content display, displaying images, rendering web content, color pickers, or presenting collections of items in Apple apps. Also use when the user says how should I display charts, what's the best way to show images, should I use a web view, how do I build a grid of items, what component shows media, or how do I present a share sheet. Cross-references: hig-foundations for color/typography/accessibility, hig-patterns for data visualization patterns, hig-components-layout for structural containers, hig-platforms for platform-specific component behavior.
tools
Automate HelpDesk tasks via Rube MCP (Composio): list tickets, manage views, use canned responses, and configure custom fields. Always search tools first for current schemas.
testing
Expert Haskell engineer specializing in advanced type systems, pure functional design, and high-reliability software. Use PROACTIVELY for type-level programming, concurrency, and architecture guidance.
tools
GraphQL gives clients exactly the data they need - no more, no less. One endpoint, typed schema, introspection. But the flexibility that makes it powerful also makes it dangerous. Without proper controls, clients can craft queries that bring down your server. This skill covers schema design, resolvers, DataLoader for N+1 prevention, federation for microservices, and client integration with Apollo/urql. Key insight: GraphQL is a contract. The schema is the API documentation. Design it carefully.