skills/interpreters/SKILL.md
Bytecode interpreter and JIT compiler skill for implementing language runtimes in C/C++. Use when designing bytecode dispatch loops (switch, computed goto, threaded code), implementing stack-based or register-based VMs, adding a simple JIT using mmap/mprotect, or understanding performance trade-offs in interpreter design. Activates on queries about bytecode VMs, dispatch loops, computed goto, JIT compilation basics, tracing JITs, or implementing a scripting language runtime.
npx skillsauth add awfixers-stuff/opencode-config interpretersInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide agents through implementing efficient bytecode interpreters and simple JITs in C/C++: dispatch strategies, VM architecture choices, and performance patterns.
| Style | Description | Examples | |-------|-------------|---------| | Stack-based | Operands on a value stack; compact bytecode | JVM, CPython, WebAssembly | | Register-based | Operands in virtual registers; fewer instructions | Lua 5+, Dalvik | | Direct threading | Each "instruction" is a function call | Some Forth implementations | | Continuation-passing | Interpreter functions return continuations | Academic |
Stack-based: easier to implement, compile to; code generation is simpler. More instructions per expression. Register-based: fewer dispatch iterations; needs register allocation in the compiler; better cache behaviour for complex expressions.
while (1) {
uint8_t op = *ip++;
switch (op) {
case OP_LOAD: push(constants[*ip++]); break;
case OP_ADD: { Value b = pop(); Value a = pop(); push(a + b); } break;
case OP_HALT: return;
// ...
}
}
Problem: switch compiles to a single indirect branch from a jump table. Modern CPUs can mispredict it heavily because the same indirect branch is used for all opcodes.
// Table of label addresses
static const void *dispatch_table[] = {
[OP_LOAD] = &&op_load,
[OP_ADD] = &&op_add,
[OP_HALT] = &&op_halt,
// ...
};
#define DISPATCH() goto *dispatch_table[*ip++]
DISPATCH(); // start
op_load:
push(constants[*ip++]);
DISPATCH();
op_add: {
Value b = pop(); Value a = pop(); push(a + b);
DISPATCH();
}
op_halt:
return;
Each opcode ends with its own indirect branch. The CPU can train the branch predictor per-opcode, dramatically improving prediction rates.
Note: &&label is a GCC/Clang extension, not standard C. Use #ifdef __GNUC__ to guard and fall back to switch for other compilers.
Each bytecode word is a function pointer or label address; the VM is the fetch-decode-execute loop itself.
typedef void (*Handler)(VM *vm);
// Bytecode is an array of handlers
Handler bytecode[] = { op_load, op_push_1, op_add, op_halt };
for (int i = 0; ; i++) {
bytecode[i](vm);
}
Tagged pointer: Store type tag in low bits of pointer (pointer alignment guarantees ≥ 2 bits free):
typedef uintptr_t Value;
#define TAG_INT 0x0
#define TAG_FLOAT 0x1
#define TAG_PTR 0x2
#define TAG_MASK 0x3
#define INT_VAL(v) ((int64_t)(v) >> 2)
#define FLOAT_VAL(v) (*(float*)((v) & ~TAG_MASK))
#define IS_INT(v) (((v) & TAG_MASK) == TAG_INT)
NaN boxing (64-bit): Store non-double values in NaN bit patterns:
// IEEE 754 quiet NaN: exponent all 1s, mantissa != 0
// Use high mantissa bits as type tag, low 48 bits as payload
// Allows pointer/int/bool/nil to fit in a double-sized slot
Used by V8 (formerly), LuaJIT, JavaScriptCore.
#define STACK_SIZE 4096
Value stack[STACK_SIZE];
Value *sp = stack; // stack pointer
#define PUSH(v) (*sp++ = (v))
#define POP() (*--sp)
#define TOP() (sp[-1])
#define PEEK(n) (sp[-(n)-1])
// Check for overflow
#define PUSH_SAFE(v) do { \
if (sp >= stack + STACK_SIZE) { vm_error("stack overflow"); } \
PUSH(v); \
} while(0)
Inline caching speeds up property lookups and method dispatch by caching the last observed type at each call site.
struct CallSite {
Type cached_type; // Last observed receiver type
void *cached_method; // Cached function pointer
int miss_count; // Number of misses
};
void invoke_method(VM *vm, CallSite *cs, Value receiver, ...) {
Type t = GET_TYPE(receiver);
if (t == cs->cached_type) {
// Cache hit: direct call, no lookup
cs->cached_method(vm, receiver, ...);
} else {
// Cache miss: look up, update cache
void *method = lookup_method(t, name);
cs->cached_type = t;
cs->cached_method = method;
cs->miss_count++;
method(vm, receiver, ...);
}
}
Polymorphic IC (PIC): cache up to N (typ. 4) type-method pairs.
For x86-64: allocate executable memory, write machine code bytes, call it.
#include <sys/mman.h>
#include <string.h>
typedef int (*JitFn)(int a, int b);
JitFn jit_compile_add(void) {
// x86-64: add rdi, rsi; mov rax, rdi; ret
static const uint8_t code[] = {
0x48, 0x01, 0xF7, // add rdi, rsi
0x48, 0x89, 0xF8, // mov rax, rdi
0xC3 // ret
};
void *mem = mmap(NULL, sizeof(code),
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (mem == MAP_FAILED) return NULL;
memcpy(mem, code, sizeof(code));
// On some systems (macOS Apple Silicon): must use mprotect
// mprotect(mem, sizeof(code), PROT_READ | PROT_EXEC);
return (JitFn)mem;
}
On macOS Apple Silicon (M-series): use pthread_jit_write_protect_np() or MAP_JIT flag.
register Value *sp asm("r15") — GCC global register variable)perf or sampling to find dispatch overhead vs actual workFor a benchmark of dispatch strategies, see references/benchmarks.md.
skills/profilers/linux-perf to profile the interpreter dispatch loopskills/low-level-programming/assembly-x86 to understand JIT outputskills/runtimes/fuzzing to fuzz the bytecode parser/loaderskills/compilers/llvm for LLVM IR-based JIT (MCJIT / ORC JIT)development
Use when starting dev servers, watchers, tilt, or any process expected to outlive the conversation. Provides zmx session management patterns for long-lived processes.
development
Zig testing skill for writing and running tests. Use when using zig build test, writing comptime tests, using test filters, working with test allocators to detect leaks, or using Zig's built-in fuzz testing (0.14+). Activates on queries about Zig tests, zig test, zig build test, comptime testing, test allocators, Zig fuzz testing, or detecting memory leaks in Zig tests.
development
Zig debugging skill. Use when debugging Zig programs with GDB or LLDB, interpreting Zig runtime panics, using std.debug.print for tracing, configuring debug builds, or debugging Zig programs in VS Code. Activates on queries about debugging Zig, Zig panics, zig gdb, zig lldb, std.debug.print, Zig stack traces, or Zig error return traces.
tools
Zig cross-compilation skill. Use when cross-compiling Zig programs to different targets, using Zig's built-in cross-compilation for embedded, WASM, Windows, ARM, or using zig cc to cross-compile C code without a system cross-toolchain. Activates on queries about Zig cross-compilation, zig target triples, zig cc cross-compile, Zig embedded targets, or Zig WASM.