skills/low-level-programming/assembly-arm/SKILL.md
AArch64 and ARM assembly skill for reading and writing ARM assembly code. Use when reading GCC/Clang output for AArch64 or ARM Thumb targets, writing inline asm in C/C++, understanding the ARM ABI (AAPCS64/AAPCS), or debugging register and stack state on ARM hardware or QEMU. Activates on queries about AArch64 assembly, ARM Thumb, NEON/SVE SIMD, ARM calling convention, inline asm for ARM, or reading ARM disassembly.
npx skillsauth add mohitmishra786/low-level-dev-skills assembly-armInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide agents through AArch64 (64-bit) and ARM (32-bit Thumb) assembly: registers, calling conventions, inline asm, and NEON/SVE SIMD patterns.
# AArch64 (native or cross-compile)
aarch64-linux-gnu-gcc -S -O2 foo.c -o foo.s
# 32-bit ARM Thumb
arm-linux-gnueabihf-gcc -S -O2 -mthumb foo.c -o foo.s
# From objdump
aarch64-linux-gnu-objdump -d -S prog
# From GDB on target
(gdb) disassemble /s main
| Register | Alias | Role |
|----------|-------|------|
| x0–x7 | — | Arguments 1–8 and return values |
| x8 | xr | Indirect result location (struct return) |
| x9–x15 | — | Caller-saved temporaries |
| x16–x17 | ip0, ip1 | Intra-procedure-call temporaries (used by linker) |
| x18 | pr | Platform register (reserved on some OS) |
| x19–x28 | — | Callee-saved |
| x29 | fp | Frame pointer (callee-saved) |
| x30 | lr | Link register (return address) |
| sp | — | Stack pointer (must be 16-byte aligned at call) |
| pc | — | Program counter (not directly accessible) |
| xzr | wzr | Zero register (reads as 0, writes discarded) |
| v0–v7 | q0–q7 | FP/SIMD args and return |
| v8–v15 | — | Callee-saved SIMD (lower 64 bits only) |
| v16–v31 | — | Caller-saved temporaries |
Width variants: x0 (64-bit), w0 (32-bit, zero-extends to 64), h0 (16), b0 (8).
Integer/pointer args: x0–x7
Float/SIMD args: v0–v7
Return: x0 (int), x0+x1 (128-bit), v0 (float/SIMD)
Callee-saved: x19–x28, x29 (fp), x30 (lr), v8–v15 (lower 64 bits)
Caller-saved: everything else
Stack must be 16-byte aligned at any bl or blr instruction.
| Instruction | Effect |
|-------------|--------|
| mov x0, x1 | Copy register |
| mov x0, #42 | Load immediate |
| movz x0, #0x1234, lsl #16 | Move zero-extended with shift |
| movk x0, #0xabcd | Move with keep (partial update) |
| ldr x0, [x1] | Load 64-bit from address in x1 |
| ldr x0, [x1, #8] | Load from x1+8 |
| str x0, [x1, #8] | Store x0 to x1+8 |
| ldp x0, x1, [sp, #16] | Load pair (two regs at once) |
| stp x29, x30, [sp, #-16]! | Store pair, pre-decrement sp |
| add x0, x1, x2 | x0 = x1 + x2 |
| add x0, x1, #8 | x0 = x1 + 8 |
| sub x0, x1, x2 | x0 = x1 - x2 |
| mul x0, x1, x2 | x0 = x1 * x2 |
| sdiv x0, x1, x2 | Signed divide |
| udiv x0, x1, x2 | Unsigned divide |
| cmp x0, x1 | Set flags for x0 - x1 |
| cbz x0, label | Branch if x0 == 0 |
| cbnz x0, label | Branch if x0 != 0 |
| bl func | Branch with link (call) |
| blr x0 | Branch with link to address in x0 |
| ret | Return (branch to x30) |
| ret x0 | Return to address in x0 |
| adrp x0, symbol | PC-relative page address |
| add x0, x0, :lo12:symbol | Low 12 bits of symbol offset |
// Non-leaf function
stp x29, x30, [sp, #-32]! // save fp, lr; allocate 32 bytes
mov x29, sp // set frame pointer
stp x19, x20, [sp, #16] // save callee-saved registers
// ... body ...
ldp x19, x20, [sp, #16] // restore
ldp x29, x30, [sp], #32 // restore fp, lr; deallocate
ret
// Leaf function (no calls, no callee-saved regs needed)
// Can use red zone (no rsp adjustment) — but AArch64 has no red zone
sub sp, sp, #16 // allocate locals
// ... body ...
add sp, sp, #16
ret
// Barrier
__asm__ volatile ("dmb ish" ::: "memory");
// Load acquire
static inline int load_acquire(volatile int *p) {
int val;
__asm__ volatile ("ldar %w0, %1" : "=r"(val) : "Q"(*p));
return val;
}
// Store release
static inline void store_release(volatile int *p, int val) {
__asm__ volatile ("stlr %w1, %0" : "=Q"(*p) : "r"(val));
}
// Read system counter
static inline uint64_t read_cntvct(void) {
uint64_t val;
__asm__ volatile ("mrs %0, cntvct_el0" : "=r"(val));
return val;
}
AArch64-specific constraints:
"Q" — memory operand suitable for exclusive/acquire/release instructions"r" — any general-purpose register"w" — any FP/SIMD register#include <arm_neon.h>
// Add 4 floats at once
float32x4_t a = vld1q_f32(arr_a); // load 4 floats
float32x4_t b = vld1q_f32(arr_b);
float32x4_t c = vaddq_f32(a, b);
vst1q_f32(result, c);
// Horizontal sum
float32x4_t sum = vpaddq_f32(c, c);
sum = vpaddq_f32(sum, sum);
float total = vgetq_lane_f32(sum, 0);
Naming convention: v<op><q>_<type>
q suffix: 128-bit (quad) vector_f32: float32, _s32: int32, _u8: uint8, etc.For a register reference, see references/reference.md.
skills/low-level-programming/assembly-x86 for x86-64 assemblyskills/compilers/cross-gcc for cross-compilation toolchainskills/debuggers/gdb for debugging ARM code with gdbserverdevelopment
Zig testing skill for writing and running tests. Use when using zig build test, writing comptime tests, using test filters, working with test allocators to detect leaks, or using Zig's built-in fuzz testing (0.14+). Activates on queries about Zig tests, zig test, zig build test, comptime testing, test allocators, Zig fuzz testing, or detecting memory leaks in Zig tests.
development
Zig debugging skill. Use when debugging Zig programs with GDB or LLDB, interpreting Zig runtime panics, using std.debug.print for tracing, configuring debug builds, or debugging Zig programs in VS Code. Activates on queries about debugging Zig, Zig panics, zig gdb, zig lldb, std.debug.print, Zig stack traces, or Zig error return traces.
tools
Zig cross-compilation skill. Use when cross-compiling Zig programs to different targets, using Zig's built-in cross-compilation for embedded, WASM, Windows, ARM, or using zig cc to cross-compile C code without a system cross-toolchain. Activates on queries about Zig cross-compilation, zig target triples, zig cc cross-compile, Zig embedded targets, or Zig WASM.
development
Zig comptime skill for compile-time evaluation and metaprogramming. Use when using comptime parameters, comptime types, generics via anytype, comptime reflection with @typeInfo, or metaprogramming patterns that replace C++ templates. Activates on queries about Zig comptime, compile-time evaluation, Zig generics, anytype, @typeInfo, comptime types, or Zig metaprogramming.