skills/rl-execution/SKILL.md
Reinforcement learning for trade execution optimization including order splitting, adaptive timing, and impact minimization
npx skillsauth add agiprolabs/claude-trading-skills rl-executionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reinforcement learning (RL) for trade execution teaches an agent to split and time large orders so that total market impact is minimized. Instead of following a fixed schedule (TWAP, VWAP), an RL agent observes real-time market state and adapts its trading rate on the fly.
Every trade has a cost beyond the quoted spread:
| Cost Component | Cause | Typical Magnitude | |---|---|---| | Spread cost | Crossing the bid-ask | 5-50 bps on DEXs | | Temporary impact | Consuming liquidity | Scales with trade rate | | Permanent impact | Information leakage | Scales with total size | | Timing risk | Price drifts while waiting | Scales with volatility and time |
A 100 SOL market buy on a thin pool can move the price 2-5%. Splitting it into ten 10 SOL slices over a few minutes can cut that cost by 30-60%. The question is how to split optimally — and that is where execution algorithms and RL come in.
The agent observes at each decision step:
state = [
remaining_qty, # How much is left to trade (0-1 normalized)
time_remaining, # Fraction of allowed horizon remaining
current_price, # Current mid-price (normalized to arrival price)
spread, # Current bid-ask spread
volatility, # Recent realized volatility
volume, # Recent trading volume (normalized)
]
Discrete actions controlling how much to trade this step:
actions = [0%, 10%, 25%, 50%, 100%] # of remaining quantity
A small action space keeps the problem tractable. Each action represents the fraction of the remaining order to execute in the current time step.
The reward penalizes execution cost relative to a benchmark:
reward = -(execution_price - arrival_price) * quantity_traded
Summed over all steps, the total reward equals the negative implementation shortfall. The agent learns to minimize total cost.
One episode = one order from placement to completion:
The simplest baseline — split the order equally across all time steps:
trade_per_step = total_quantity / num_steps
Pros: Simple, deterministic, easy to implement. Cons: Ignores market conditions entirely.
Split proportional to expected volume in each period:
trade_at_step_t = total_quantity * (expected_volume[t] / total_expected_volume)
Pros: Trades more when liquidity is available. Cons: Requires accurate volume forecasts; still non-adaptive.
The foundational analytical model. Minimizes a combination of execution cost and timing risk:
minimize: E[cost] + λ * Var[cost]
With linear impact assumptions, this yields a closed-form optimal trajectory.
See references/execution_algorithms.md for the full derivation.
An RL agent (DQN, PPO, or similar) that learns the execution policy from simulated experience:
# Pseudocode training loop
for episode in range(num_episodes):
state = env.reset(order_qty=Q, horizon=T)
done = False
while not done:
action = agent.select_action(state)
next_state, reward, done, info = env.step(action)
agent.store_transition(state, action, reward, next_state, done)
agent.update()
state = next_state
Pros: Adapts to current market conditions, can learn non-linear patterns. Cons: Requires realistic simulator, sim-to-real gap, training instability.
The simulator uses a standard two-component impact model:
temporary_impact = η * (trade_rate / avg_volume)
permanent_impact = γ * (trade_rate / avg_volume)
The execution price for a trade of size q at time t:
exec_price = mid_price + permanent_impact + temporary_impact
mid_price_next = mid_price + permanent_impact + noise
This skill is most valuable when:
For small retail orders (<$1,000 on liquid pairs), simple market orders or
basic slippage limits are sufficient. See the slippage-modeling skill instead.
| Skill | Integration |
|---|---|
| slippage-modeling | Provides impact estimates to calibrate the simulator |
| position-sizing | Determines the total order size to execute |
| liquidity-analysis | Assesses available liquidity for realistic simulation |
| volatility-modeling | Supplies volatility estimates for the state vector |
| jupiter-swap | Actual on-chain execution of the computed trade schedule |
python scripts/execution_simulator.py
Runs TWAP, VWAP, and adaptive strategies in a simulated market and compares execution costs across many trials.
python scripts/almgren_chriss.py
Computes the analytically optimal execution trajectory and compares it to TWAP for a given set of market parameters.
references/execution_algorithms.md — TWAP, VWAP, Almgren-Chriss, IS, and
RL execution algorithms with formulas and comparisonreferences/rl_framework.md — MDP formulation, environment design, training
methodology, and practical considerations for RL executionscripts/execution_simulator.py — Simulated order execution comparing TWAP,
VWAP, and adaptive strategies with price impactscripts/almgren_chriss.py — Almgren-Chriss optimal execution model with
trajectory computation and cost analysisuv pip install numpy
No API keys required — all scripts run in simulation/demo mode.
This skill provides educational analysis tools for studying execution algorithms. It does not constitute financial advice. Simulated results do not guarantee real-world performance. Always test execution strategies with small sizes before scaling up.
data-ai
DeFi yield evaluation including fee APR, real vs nominal yield, net APY after costs, and yield sustainability analysis
tools
Real-time Solana transaction and account streaming via Yellowstone gRPC (Geyser plugin)
tools
Large wallet monitoring, accumulation and distribution detection, and smart money signal generation for Solana tokens
tools
Wash sale detection under 2025 US crypto rules with 61-day window monitoring, disallowed loss tracking, and safe re-entry countdown