skills/agentcgroup-understanding-controlling-os/SKILL.md
Design and implement OS-level resource controls for sandboxed AI agents using hierarchical cgroups, eBPF enforcement, and tool-call-level resource management. Use when: 'set up cgroups for AI agent containers', 'control memory for coding agents', 'isolate tool-call resources with eBPF', 'manage multi-tenant agent resource limits', 'prevent OOM kills in agent sandboxes', 'configure agent resource policies with cgroup v2'.
npx skillsauth add ndpvt-web/arxiv-claude-skills agentcgroup-understanding-controlling-osInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to design, configure, and implement Linux cgroup v2 hierarchies and eBPF-based enforcement policies specifically tuned for AI agent workloads. Unlike traditional container resource limits that operate at a single level, this approach creates per-tool-call sub-cgroups, uses in-kernel eBPF hooks for sub-second enforcement, and applies graduated responses (throttle before kill) to handle the extreme memory spikes and unpredictable resource demands characteristic of AI coding agents.
AI coding agents have a fundamentally different resource profile than web services, batch jobs, or serverless functions. Research on 144 SWE-bench tasks shows that OS-level execution (tool calls, container init, agent init) accounts for 56-74% of end-to-end latency, and memory -- not CPU -- is the primary concurrency bottleneck. Memory spikes are driven by individual tool calls (running pytest, git diff, compilation) and exhibit a peak-to-average ratio of up to 15.4x, with bursts lasting only 1-2 seconds. Critically, these patterns are non-deterministic: the same task on the same model produces 1.8x variance across runs.
This creates three mismatches with existing resource controls. First, a granularity mismatch: container-level limits (Kubernetes QoS, Docker --memory) apply one budget to the entire agent session, but demands fluctuate per tool call -- a static limit that accommodates peaks wastes 93% of allocated memory on average. Second, a responsiveness mismatch: user-space controllers (kubelet, custom autoscalers) react on millisecond-to-minute timescales via polling, but agent bursts are sub-second and unpredictable, so the controller misses the window entirely. Third, an adaptability mismatch: history-based prediction fails because agent workloads are stateful and non-deterministic, and an OOM kill destroys minutes of accumulated LLM context that cannot be cheaply re-created.
AgentCgroup addresses these through three mechanisms: (1) a hierarchical cgroup v2 structure where each agent maps to a parent cgroup and each tool call spawns a child sub-cgroup, isolating the stable framework baseline (~185 MB) from tool-driven bursts; (2) in-kernel eBPF enforcement via sched_ext for CPU scheduling priority and memcg_bpf_ops hooks (e.g., get_high_delay_ms) for graduated memory throttling instead of termination; and (3) runtime-adaptive policies where eBPF programs trace process creation and memory allocation in-kernel to detect tool-call boundaries automatically, applying priority-based throttling and freezing without user-space round-trips.
Audit the agent runtime structure. Identify the agent framework process (Python/Node), its child processes per tool call (bash, pytest, git, compilers), and the container runtime (Docker, gVisor, Firecracker). Map the process tree to understand which PIDs belong to the framework baseline vs. tool-call bursts.
Create a hierarchical cgroup v2 layout. Under the container's top-level cgroup, create a parent cgroup per agent session and child sub-cgroups per tool-call type. Enable the memory and cpu controllers at each level:
# Enable controllers on parent
echo "+memory +cpu +pids" > /sys/fs/cgroup/agent-session-01/cgroup.subtree_control
# Create tool-call sub-cgroups
mkdir /sys/fs/cgroup/agent-session-01/tool-call-current
Set graduated memory limits on the parent cgroup. Configure memory.high (soft throttle threshold) well below memory.max (hard OOM boundary) to create a throttling zone that buys time before termination:
echo 800M > /sys/fs/cgroup/agent-session-01/memory.high
echo 1200M > /sys/fs/cgroup/agent-session-01/memory.max
Assign tool-call processes to sub-cgroups. When the agent framework spawns a tool call, move the child PID into the tool-call sub-cgroup. Implement this in the agent wrapper script or via an eBPF program that traces fork/exec and auto-assigns based on process ancestry:
echo $TOOL_PID > /sys/fs/cgroup/agent-session-01/tool-call-current/cgroup.procs
Set per-tool-call memory limits on sub-cgroups. Apply tighter limits to the tool-call sub-cgroup so that a runaway pytest or build doesn't consume the entire agent budget:
echo 600M > /sys/fs/cgroup/agent-session-01/tool-call-current/memory.high
echo 900M > /sys/fs/cgroup/agent-session-01/tool-call-current/memory.max
Write eBPF programs for in-kernel enforcement (advanced). Use memcg_bpf_ops hooks to customize throttle delays when memory.high is breached, and sched_ext to prioritize latency-sensitive tool calls. Develop with libbpf and BPF CO-RE for portability:
SEC("memcg/get_high_delay_ms")
int BPF_PROG(throttle_delay, struct mem_cgroup *memcg, unsigned int *delay_ms) {
u32 priority = get_cgroup_priority(memcg);
if (priority == PRIORITY_HIGH)
*delay_ms = 10; // Brief throttle for high-priority agents
else
*delay_ms = 200; // Aggressive throttle for low-priority
return 0;
}
Implement tool-call boundary detection. Attach eBPF tracepoints to sched_process_fork and sched_process_exec to detect when the agent framework launches a new tool call. Use BPF maps to track the mapping between PIDs and tool-call sub-cgroups.
Configure priority-based graduated responses. Define a policy hierarchy: first throttle via memory.high delays, then freeze low-priority sub-cgroups via cgroup.freeze, and only OOM-kill as a last resort. Store priority mappings in BPF maps shared between the kernel and a lightweight user-space daemon.
Deploy a lightweight user-space daemon for lifecycle management. Write a small daemon that manages cgroup creation/teardown for each tool call, updates BPF map entries for policy configuration, and handles cgroup cleanup when tool calls complete. This daemon does not make enforcement decisions -- it only manages structure.
Monitor and tune using cgroup pressure metrics. Read memory.pressure, cpu.pressure, and memory.events from each sub-cgroup to observe throttle frequency, OOM events, and pressure stall durations. Adjust memory.high thresholds based on observed tool-call profiles for the specific agent framework.
Example 1: Configuring cgroup limits for a SWE-bench agent container
User: "I'm running AI coding agents on SWE-bench tasks in Docker containers. They keep getting OOM-killed during test execution. How do I set up resource controls?"
Approach:
Output:
#!/bin/bash
# setup-agent-cgroups.sh -- Run inside the container or via Docker's cgroup parent
AGENT_CGROUP="/sys/fs/cgroup/agent-session-${SESSION_ID}"
TOOLCALL_CGROUP="${AGENT_CGROUP}/tool-call"
# Create hierarchy
mkdir -p "$TOOLCALL_CGROUP"
echo "+memory +cpu +pids" > "${AGENT_CGROUP}/cgroup.subtree_control"
# Agent-level limits: soft throttle at 800M, hard kill at 1200M
echo 838860800 > "${AGENT_CGROUP}/memory.high" # 800 MB
echo 1258291200 > "${AGENT_CGROUP}/memory.max" # 1200 MB
# Tool-call limits: tighter bounds for individual commands
echo 629145600 > "${TOOLCALL_CGROUP}/memory.high" # 600 MB
echo 943718400 > "${TOOLCALL_CGROUP}/memory.max" # 900 MB
# Agent wrapper: move tool-call PIDs into sub-cgroup
run_tool_call() {
local cmd="$1"
sh -c "echo \$\$ > ${TOOLCALL_CGROUP}/cgroup.procs && exec $cmd"
}
# Example: run pytest in the tool-call sub-cgroup
run_tool_call "python -m pytest tests/ -x --timeout=300"
Example 2: eBPF-based auto-detection of tool-call boundaries
User: "I want to automatically detect when my agent spawns tool calls and assign them to separate cgroups without modifying the agent framework code."
Approach:
Output:
// tool_call_detect.bpf.c
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 256);
__type(key, u32); // PID of agent framework
__type(value, u64); // cgroup ID for tool-call sub-cgroup
} agent_pids SEC(".maps");
SEC("tp/sched/sched_process_fork")
int detect_tool_call(struct trace_event_raw_sched_process_fork *ctx) {
u32 parent_pid = ctx->parent_pid;
u32 child_pid = ctx->child_pid;
u64 *tool_cgroup_id = bpf_map_lookup_elem(&agent_pids, &parent_pid);
if (tool_cgroup_id) {
// Parent is a known agent framework process.
// Signal user-space daemon to migrate child_pid to tool-call cgroup.
// (Use a ring buffer event to notify the daemon)
struct tool_call_event evt = {
.child_pid = child_pid,
.cgroup_id = *tool_cgroup_id,
};
bpf_ringbuf_output(&events, &evt, sizeof(evt), 0);
}
return 0;
}
char LICENSE[] SEC("license") = "GPL";
Example 3: Multi-tenant isolation for concurrent agent sessions
User: "I need to run 8 AI agents concurrently on a 16GB machine. How do I prevent one agent's memory spike from killing others?"
Approach:
Output:
#!/bin/bash
# multi-tenant-setup.sh
TOTAL_MEM_MB=16384
NUM_AGENTS=8
PER_AGENT_HIGH=$((TOTAL_MEM_MB * 85 / 100 / NUM_AGENTS)) # 1740 MB soft
PER_AGENT_MAX=$((TOTAL_MEM_MB * 95 / 100 / NUM_AGENTS)) # 1945 MB hard
for i in $(seq 1 $NUM_AGENTS); do
CGROUP="/sys/fs/cgroup/agent-${i}"
mkdir -p "${CGROUP}/tool-call"
echo "+memory +cpu +pids" > "${CGROUP}/cgroup.subtree_control"
echo "${PER_AGENT_HIGH}M" > "${CGROUP}/memory.high"
echo "${PER_AGENT_MAX}M" > "${CGROUP}/memory.max"
# Tool-call sub-cgroup gets 70% of agent budget
TOOL_HIGH=$((PER_AGENT_HIGH * 70 / 100))
echo "${TOOL_HIGH}M" > "${CGROUP}/tool-call/memory.high"
echo "${PER_AGENT_MAX}M" > "${CGROUP}/tool-call/memory.max"
# Monitor pressure for adaptive tuning
echo "Monitor: cat ${CGROUP}/memory.pressure"
done
echo "Provisioned $NUM_AGENTS agents with ${PER_AGENT_HIGH}MB soft / ${PER_AGENT_MAX}MB hard limits"
memory.high 30-40% below memory.max to create a throttling buffer zone. This gives the system time to reclaim memory before triggering OOM kills, preserving expensive LLM context.memory.peak and memory.events.cgroup.freeze for low-priority agents under memory pressure instead of killing them. Frozen agents can resume once pressure subsides, avoiding the cost of re-executing LLM inference.memory.events.local to track high (throttle triggers), max (hard limit hits), and oom_kill counts per sub-cgroup to identify which tool calls cause spikes.memory.max without memory.high. Without the throttle threshold, the kernel jumps straight from unlimited allocation to OOM kill with no graceful degradation./proc or cgroup files) for burst detection. Memory spikes in agent tool calls last 1-2 seconds; user-space polling loops have 10-100ms latency plus reaction time, missing the burst window.memory.events shows oom_kill counts rising, the gap between memory.high and memory.max is too small. Increase the throttle zone or lower memory.high to trigger throttling earlier.memory.high settings can stall an agent indefinitely. Set a timeout in the agent wrapper that kills and restarts the tool call if it remains throttled beyond a threshold (e.g., 30 seconds).memcg_bpf_ops requires Linux 6.15+ with specific patches (currently under upstream review as of the paper). Fall back to user-space cgroup management with memory.high/memory.max on older kernels -- this sacrifices sub-second responsiveness but retains the graduated approach.cgroup.kill to terminate stragglers before rmdir, or implement a reaper in the lifecycle daemon.memcg_bpf_ops and sched_ext requires Linux 6.15+ with patches that are not yet upstream. The cgroup v2 hierarchy approach works on Linux 5.8+, but without in-kernel enforcement hooks.Paper: Zheng et al., "AgentCgroup: Understanding and Controlling OS Resources of AI Agents" (2026). arXiv: 2602.09345. Key sections: Section 3 for workload characterization data (memory spike ratios, latency breakdowns), Section 4 for the three-mismatch analysis, and Section 5 for the hierarchical cgroup + eBPF architecture.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".